Introducing Fireworks AI's Latest Developments: A New Era in AI Performance and Customization

Introducing Fireworks AI's Latest Developments: A New Era in AI Performance and Customization

Fireworks AI has recently made significant strides in the AI industry, marked by a successful Series B funding round and groundbreaking advancements in its product lineup. This blog post dives into the latest updates, highlighting the company's new large language model (LLM), Fireworks-AI-Default, and other innovations.

Funding and Valuation

Fireworks AI secured $52 million in a Series B funding round led by Sequoia Capital, pushing its valuation to $552 million. Esteemed investors like NVIDIA, AMD, and MongoDB Ventures also participated, underscoring the industry's confidence in Fireworks AI.

Product and Technology Innovations

Fireworks AI has introduced several cutting-edge technologies:

  • FireFunction V2: An open-weight function-calling model that orchestrates across multiple models, external data sources, and APIs. This allows developers to create scalable multi-inference workflows seamlessly.
  • FireAttention V2: A custom CUDA kernel offering up to 8x speed improvements for real-time applications compared to other open-source inference frameworks.
  • FireOptimus: An LLM inference optimizer that learns traffic patterns to deliver better latency (up to 2x) and quality comparable to or better than GPT-4.

Performance and Speed

Fireworks AI has significantly enhanced the speed of its models. For instance:

  • Mixtral MoE 8x7b Instruct model processes up to 300 tokens per second.
  • Llama 70B Chat model processes up to 200 tokens per second.
  • Stable Diffusion XL model generates 1024x1024 images in roughly 1.2 seconds.

Dedicated deployments offer up to 3x the speed and 3.5x the throughput of Hugging Face Text Generation Interfaces (TGI) on the same GPU setup.

Customization and Deployment

Fireworks AI provides smaller, production-grade models for private and secure deployment. These models can be customized using minimal human-curated data through ultra-fast LoRA fine-tuning, enabling developers to transition from dataset preparation to querying a fine-tuned model within minutes. The platform supports dedicated deployments and serverless models, offering higher rate limits and lower costs per GPU.

Customer Base and Use Cases

Fireworks AI serves a diverse range of customers, including AI startups like Cresta, Cursor, and Liner, as well as digital-native companies such as DoorDash, Quora, and Upwork. These companies utilize Fireworks AI for applications like code generation, instant apply, smart rewrites, and cursor prediction.

Industry Direction

Fireworks AI is leading the industry shift towards compound AI systems, which leverage a mix of AI models to solve business problems more effectively than a single model approach.

Partnerships and Integration

Fireworks AI collaborates with cloud providers to enable deployment into existing virtual private clouds, allowing cloud vendors to focus on hardware and service availability while Fireworks delivers high-performance inference engines and superior developer experiences. The company also partners with top providers across the AI stack, including NVIDIA and AMD, to enhance its platform further.

Read more