Introducing Vertex AI's text-multilingual-embedding-002: A Game Changer for Multilingual Text Embeddings
Vertex AI's text-multilingual-embedding-002 model is designed to revolutionize the way we handle multilingual text embeddings. This advanced model supports a wide array of languages beyond English, making it an invaluable tool for global applications.
Supported Languages
The model has been rigorously evaluated on numerous languages including Arabic, Bengali, English, Spanish, German, Persian, Finnish, French, Hindi, Indonesian, Japanese, Korean, Russian, Swahili, Telugu, Thai, Yoruba, and Chinese. Additionally, it supports many other languages, offering extensive versatility for users worldwide.
Usage
With the text-multilingual-embedding-002 model, you can generate dense vector representations of text. These embeddings are essential for tasks that require a deep understanding of the text's meaning rather than just direct word or syntax matches, such as search, recommendation systems, and natural language understanding.
API and SDK Access
Accessing this model is straightforward through the Vertex AI API or the Vertex AI SDK for Python. Simply specify the model ID in your API requests or SDK calls to embed texts efficiently.
Token Limit and Auto-Truncation
Each input text has a token limit of 2048. Texts longer than this are silently truncated unless you set autoTruncate
to false
, giving you control over how your data is processed.
Dimensionality Options
The model primarily uses 768-dimensional dense vector embeddings. However, it also supports flexible dimensions such as 256 or 128 without compromising quality, which is beneficial for conserving storage and memory.
Stay Updated
To leverage the full capabilities of Vertex AI, it's recommended to use the latest versions of the models. The text-multilingual-embedding-002 model is among the newest and most advanced options available for multilingual text embeddings.
For detailed usage instructions and examples, refer to the official Google Cloud documentation and tutorials.