Fireworks AI is an artificial intelligence platform founded in 2022 in Redwood City, California. The company aims to enable the low-latency, high-efficiency, and cost-effective deployment and customization of generative AI (GenAI) models. Fireworks AI provides a cloud-based infrastructure that facilitates the production-grade deployment of open-source large language models (LLMs) and multimodal models.

Founding

Fireworks AI was founded by engineers formerly involved in the development of PyTorch at Meta. The company’s founding CEO, Lin Qiao, previously led the PyTorch platform at Meta. Other founding members include Dmytro Dzhulgakov (CTO), Chenyu Zhao, James Reed, Benny Chen, Pawel Garbacki, and Dmytro Ivchenko, who have experience working with Google Vertex AI and Meta Ads infrastructure.

Technological Infrastructure and Products

Fireworks AI offers an API-based platform that allows developers to deploy and customize generative AI models. The platform supports the deployment of over 100 open-source models on a serverless basis or on-demand GPU resources. These include text, image, audio, and multimodal models such as LLaMA 3, Qwen3, Mixtral, and Stable Diffusion. The platform supports compound AI systems—multi-component configurations where tasks are solved not by a single model but by orchestrating multiple smaller models and external data sources.

FireFunction and Compound AI Approach

Fireworks AI emphasizes the development of compound AI systems, where different subtasks are handled by purpose-optimized small models, tools, and data sources. At the core of this structure is FireFunction V2, which enables function calling, interaction with external data sources, and orchestration of multimodal tasks.

Infrastructure Partnerships and GPU Utilization

To support ultra-low latency scenarios, Fireworks AI relies on Amazon Web Services (AWS) infrastructure. The company utilizes NVIDIA A100 and H100 Tensor Core GPUs via Amazon EC2 P4 and P5 instances, providing up to 4× lower latency and 20× greater performance compared to previous solutions. It also uses AWS services such as Amazon EKS (Elastic Kubernetes Service) and Amazon S3 (Simple Storage Service).

Services Offered to Clients

Fireworks AI offers its services both on a pay-as-you-go model and through enterprise-level configurations. The platform complies with SOC 2 Type II and HIPAA security and privacy standards. User inputs and outputs are not stored, ensuring data privacy.

Pricing

Fireworks AI’s pricing model is based on token- or time-based usage for services such as serverless model inference, fine-tuning, image generation, and speech transcription. GPUs such as NVIDIA H100, A100, and AMD MI300X are available at hourly rates. Running LoRA (Low-Rank Adaptation) models is included in the base model pricing.

Investments and Partnerships

In 2024, Fireworks AI raised $52 million in a Series B funding round led by Sequoia Capital, bringing its total funding to $77 million. Investors include Benchmark, Databricks Ventures, NVIDIA, AMD, MongoDB, and others. The company also has infrastructure and data partnerships with Oracle Cloud Infrastructure (OCI), Google Cloud Platform, and MongoDB.

Uses and Integrations

Fireworks AI supports generative AI solutions in areas such as source code completion (e.g., with Sourcegraph) and email-based content queries (e.g., with Superhuman). Through collaborations with MongoDB, the platform supports Retrieval-Augmented Generation (RAG) systems that enrich model context using external data sources.

Future Outlook

Fireworks AI focuses its R&D efforts on advancing compound AI systems, with the goal of expanding the use of multimodal models and integrating customizable AI components. The company continues to scale production-grade solutions with a focus on model efficiency, low latency, and adaptability. It also prioritizes making open-source models accessible to the broader developer community.

Fireworks AI