Introducing Vertex AI's New Text-Multilingual-Embedding-002 Model
Vertex AI is excited to announce the public preview of its latest text embedding model, the text-multilingual-embedding-002. This model is packed with features designed to enhance text-based applications across a multitude of languages.
Multilingual Support
The text-multilingual-embedding-002
model has been evaluated on a wide array of languages, including Arabic, Bengali, English, Spanish, German, Persian, Finnish, French, Hindi, Indonesian, Japanese, Korean, Russian, Swahili, Telugu, Thai, Yoruba, and Chinese. Additionally, it supports an even broader range of languages, such as Afrikaans, Albanian, Amharic, Armenian, Azerbaijani, Basque, Belarusian, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Corsican, Czech, Danish, Dutch, Esperanto, Estonian, Filipino, Galician, Georgian, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hmong, Hungarian, Icelandic, Igbo, Irish, Italian, Javanese, Kannada, Kazakh, Khmer, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, and many more.
Flexible Dimensionality
This model offers flexible dimensionality, allowing users to select the number of dimensions according to their specific needs. This flexibility can help in optimizing storage and memory usage.
Top-Tier Performance
In terms of performance, the text-multilingual-embedding-002
model ranks among the top models on the Massive Text Embedding Benchmark (MTEB) leaderboard. This ensures that users can rely on its robust and efficient text embedding capabilities.
Easy to Use
Using this model is straightforward with the Vertex AI API or the Vertex AI SDK for Python. It supports up to 250 input texts per request in the us-central1
region and has a token limit of 2048 per input text.
For practical implementation, example code is available in the Vertex AI SDK for Python documentation, offering a step-by-step guide on how to embed texts using this powerful model.
These updates underscore the text-multilingual-embedding-002
model's enhanced capabilities and flexibility, making it an invaluable tool for diverse text-based applications across multiple languages.