perplexity-ai

Introducing Perplexity's Llama-3.1-Sonar-Small-128K-Online: Enhanced Performance for Real-Time Interactions

Tal Peretz

14 Aug 2024 — 1 min read

Perplexity AI has recently launched its new and improved Sonar models, designed to enhance performance and efficiency. Among these models is the Llama-3.1-Sonar-Small-128K-Online, a cutting-edge online language model optimized for real-time interactions.

The Llama-3.1-Sonar-Small-128K-Online model boasts several impressive specifications:

Model Name: Llama-3.1-Sonar-Small-128K-Online
Model Type: Online, optimized for real-time interactions

To get started with this model, users need to set up an API key with Perplexity AI. Once the API key is configured, prompts can be run using the llm command, specifying the model name. For example:

llm -m llama-3.1-sonar-small-128k-online 'Your query here'

One of the standout features of the Sonar models, including the Small variant, is their low latency. The Llama-3.1-Sonar-Small-128K-Online model is noted for having one of the lowest latencies among large language models (LLMs), making it ideal for applications that require quick responses.

Regarding pricing, the Sonar models, including the Llama-3.1-Sonar-Small-128K-Online, follow a pricing structure based on a combination of a fixed price per request and a variable price dependent on the number of input and output tokens.

This model can also be integrated into Retrieval-Augmented Generation (RAG) solutions. In these scenarios, it can be utilized to generate summaries and provide relevant information based on specific topics. For instance, you can query the model with specific instructions to obtain detailed and accurate information quickly.

In summary, the Llama-3.1-Sonar-Small-128K-Online model by Perplexity AI offers enhanced performance, low latency, and flexible pricing, making it an excellent choice for real-time applications and RAG solutions.

Introducing Gemini 2.0 Flash Preview Image Generation: Google's Next-Step Generative AI Model

Google’s Gemini 2.0 Flash Preview Image Generation is the latest breakthrough in generative AI, introducing robust multimodal capabilities that enable intuitive, context-aware image generation and editing. This model builds upon the powerful Gemini 2.0 Flash architecture, providing developers and creators with a versatile tool for visually expressive

Exploring Google's Gemini 2.5 Flash Preview TTS: Powerful, Cost-Efficient Text-to-Speech

Google continues to set the pace in generative AI with the introduction of Gemini 2.5 Flash Preview TTS, a sophisticated text-to-speech model designed for structured workflows demanding high control, transparency, and cost-efficiency. Released as part of Google's Gemini 2.5 series, this model builds upon previous iterations

Introducing Vertex AI Gemini-2.5-Pro-Preview-TTS: Google's New Flagship LLM Explained

Google continues to push the boundaries of artificial intelligence with the recent release of its highly anticipated Vertex AI Gemini-2.5-Pro-Preview-TTS model. As part of the Vertex AI ecosystem, Gemini 2.5 Pro represents a significant leap forward in AI capabilities, offering advanced reasoning, exceptional coding proficiency, and unparalleled multimodal

Introducing Gemini 2.5 Pro Preview TTS: Google's Next-Generation Multimodal AI

Google DeepMind's Gemini 2.5 Pro Preview TTS is the latest breakthrough in large language models (LLMs), designed to deliver exceptional performance across reasoning, coding, multimodal capabilities, and text-to-speech (TTS) quality. Let's explore the key features, capabilities, and practical applications of this advanced AI model. Key

Read more

Introducing Gemini 2.0 Flash Preview Image Generation: Google's Next-Step Generative AI Model

Exploring Google's Gemini 2.5 Flash Preview TTS: Powerful, Cost-Efficient Text-to-Speech

Introducing Vertex AI Gemini-2.5-Pro-Preview-TTS: Google's New Flagship LLM Explained

Introducing Gemini 2.5 Pro Preview TTS: Google's Next-Generation Multimodal AI