Introducing databricks/databricks-bge-large-en: Advanced Text Embeddings for Your Applications

Introducing databricks/databricks-bge-large-en: Advanced Text Embeddings for Your Applications

The databricks/databricks-bge-large-en model is a part of the BAAI General Embedding (BGE) series, specifically designed for generating high-quality text embeddings. This model boasts an embedding dimension of 1024 and a sequence length of 512, delivering exceptional performance across tasks such as retrieval, clustering, and pair classification.

Integration with Databricks

Databricks supports the BGE Large (English) model through its Foundation Model APIs, providing optimized inference capabilities. You can easily access this model via the Databricks UI, Python SDK, or REST API. The designated endpoint for this model is databricks-bge-large-en.

Utilizing BGE Large in LangChain

In LangChain, the DatabricksEmbeddings class allows you to compute text embeddings using the foundational model endpoints. Here's a quick example:

from langchain_databricks import DatabricksEmbeddings
embeddings = DatabricksEmbeddings(endpoint='databricks-bge-large-en')

This integration enables seamless use of BGE embeddings within your LangChain applications.

Updates and Recommendations

To benefit from the latest enhancements, we recommend switching to the newest version, BAAI/bge-large-en-v1.5. This version offers improved performance metrics and a more reasonable similarity distribution, while maintaining the same usage method as its predecessors.

By leveraging the latest version and integrating it with platforms like Databricks and LangChain, you can harness the full potential of the advanced BGE models for your applications.

Read more