Baseten is an infrastructure software platform that enables the deployment, serving, and scaling of machine learning (ML) models in production environments. Founded in 2019 in San Francisco, California, the company aims to help organizations developing artificial intelligence applications run their models quickly, reliably, and cost-effectively. Centered around the model inference process, Baseten focuses on addressing performance bottlenecks in this critical stage. Its client portfolio includes AI-focused companies such as Writer, Descript, Abridge, and Gamma.

Founding and Funding

Baseten was founded in 2019 by Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta. The company has grown by developing software that optimizes AI model inference. As of 2025, Baseten employs over 60 people. In the same year, it secured $75 million in a Series C funding round co-led by Spark Capital and IVP. The company has raised a total of $135 million and reached a valuation of $850 million.

Technological Infrastructure and Partnerships

Baseten operates its cloud-based model serving infrastructure on Amazon Web Services (AWS), utilizing services such as Amazon EC2 (Elastic Compute Cloud) and Amazon EKS (Elastic Kubernetes Service). It also maintains a close partnership with NVIDIA, integrating NVIDIA’s TensorRT-LLM (TensorRT for Large Language Models) and Triton Inference Server solutions to improve inference latency and efficiency. Through the NVIDIA Inception program, Baseten gained early access to TensorRT-LLM technology and has delivered an average of 2x improvement in inference efficiency and up to 50% reduction in time to first token (TTFT) for its customers.

Products and Services

Baseten's platform supports deployment, serving, monitoring, and management of AI models. Key components include:

Truss: An open-source model packaging library that supports frameworks such as PyTorch, TensorFlow, HuggingFace Transformers, TensorRT, and Triton. It enables Python-based models to be transferred into production environments along with their dependencies.
Chains: A software development kit (SDK) for building complex AI workflows, allowing users to create multi-step model chains.
Inference Engine: Supports synchronous, asynchronous, and streaming inference. It includes advanced techniques such as speculative decoding.
Observability: Real-time monitoring tools enable tracking of system performance and integrate with third-party observability platforms like Datadog and Prometheus.

Models and Use Cases

Baseten provides a model library that allows users to deploy their own models or pre-trained open-source models in production. Supported use cases include text generation (LLMs), audio transcription (Whisper), image generation, embedding, voice synthesis, and text-to-speech (TTS) applications.

Infrastructure and Scalability

The Baseten infrastructure is designed to support multi-region, multi-cloud, and multi-cluster deployments. The system is compatible with GPU models such as NVIDIA A100, H100, H200, GH200, and L4, and features automatic horizontal scaling to support thousands of replicas as needed. It is engineered for 99.999% availability, equating to approximately five and a half minutes of downtime per year.

Compliance and Security

Baseten complies with international data protection and security standards, including HIPAA, SOC 2 Type II, and GDPR. The system does not retain user data, and all model inputs and outputs are fully controlled by the user.

Financial Structure and Client Base

Baseten uses a pay-per-minute pricing model based on compute time consumption. The platform offers three service tiers: Basic, Pro, and Enterprise. Clients include companies such as Descript, Patreon, Rime, and Bland AI. Reported inference cost savings range between 40% and 65%.

Baseten