Exploring Vertex AI's Latest: Llama-3.2-90B Vision-Instruct-MaaS

Exploring Vertex AI's Latest: Llama-3.2-90B Vision-Instruct-MaaS

The Vertex AI platform has just unveiled the Llama-3.2-90B Vision-Instruct model, a cutting-edge addition to Meta's new generation of multimodal large language models (LLMs). This model is designed to integrate text and image inputs, enabling a wide range of advanced AI tasks.

Model Architecture and Capabilities

The Llama-3.2-90B Vision-Instruct model combines an optimized transformer architecture with a vision encoder. This vision encoder seamlessly integrates with the pre-trained Llama 3.1 language model through a series of cross-attention layers. This setup allows the model to handle both text and image inputs, producing text outputs. Key functionalities include:

  • Image captioning
  • Image-text retrieval
  • Visual grounding
  • Visual Q&A

Training and Optimization

Trained on NVIDIA H100 Tensor Core GPUs, the model is optimized for high throughput and low latency. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The vision encoder benefits from hardware-level optimizations thanks to its export into an ONNX graph and subsequent TensorRT engine build.

Deployment on Vertex AI

The Llama-3.2-90B Vision-Instruct model is accessible via Vertex AI's Model Garden, offering a fully managed and serverless Model-as-a-Service (MaaS) experience. Developers can easily access, customize, and deploy the model without managing infrastructure. Key benefits include:

  • Simple API calls for experimentation
  • Fine-tuning with custom data
  • Fully managed infrastructure
  • Pay-as-you-go billing

Use Cases

This model excels in scenarios requiring visual reasoning, such as:

  • Image-based search
  • Content generation
  • Interactive educational tools
  • Image captioning
  • Visual Q&A
  • Document Q&A

Technical Specifications

With 90 billion parameters and a context length of 128K tokens, the model uses grouped query attention (GQA) and is optimized for inference. For image+text applications, it supports English, although it has been trained on a broader range of languages for text-only tasks.

Integration and Ecosystem

Vertex AI provides a unified platform for experimenting, customizing, and deploying Llama-3.2 models. It integrates with tools like LangChain and Genkit’s Vertex AI plugin, facilitating the creation of intelligent agents powered by Llama-3.2.

Availability

Currently in preview on Vertex AI, the 90B model will soon be generally available, with an 11B vision model also coming as MaaS in the near future.

Read more

Exploring IBM Watsonx Granite-3-8b-Instruct: A Powerful Enterprise-Focused LLM

Exploring IBM Watsonx Granite-3-8b-Instruct: A Powerful Enterprise-Focused LLM

The IBM Watsonx Granite-3-8b-Instruct is a state-of-the-art large language model (LLM) tailored specifically for enterprise-level applications. With its compact yet powerful 8-billion parameter architecture, it enables businesses to efficiently tackle complex, nuanced tasks. Key Capabilities and Features * 8-Billion Parameter Model: Optimized for detailed, enterprise-specific tasks. * Extended Context Window (128K tokens)