
Baseten is an infrastructure software platform that enables the deployment, serving, and scaling of machine learning (ML) models in production environments. Founded in 2019 in San Francisco, California, the company aims to help organizations developing artificial intelligence applications run their models quickly, reliably, and cost-effectively. Centered around the model inference process, Baseten focuses on addressing performance bottlenecks in this critical stage. Its client portfolio includes AI-focused companies such as Writer, Descript, Abridge, and Gamma.
Baseten was founded in 2019 by Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta. The company has grown by developing software that optimizes AI model inference. As of 2025, Baseten employs over 60 people. In the same year, it secured $75 million in a Series C funding round co-led by Spark Capital and IVP. The company has raised a total of $135 million and reached a valuation of $850 million.
Baseten operates its cloud-based model serving infrastructure on Amazon Web Services (AWS), utilizing services such as Amazon EC2 (Elastic Compute Cloud) and Amazon EKS (Elastic Kubernetes Service). It also maintains a close partnership with NVIDIA, integrating NVIDIA’s TensorRT-LLM (TensorRT for Large Language Models) and Triton Inference Server solutions to improve inference latency and efficiency. Through the NVIDIA Inception program, Baseten gained early access to TensorRT-LLM technology and has delivered an average of 2x improvement in inference efficiency and up to 50% reduction in time to first token (TTFT) for its customers.
Baseten's platform supports deployment, serving, monitoring, and management of AI models. Key components include:
Baseten provides a model library that allows users to deploy their own models or pre-trained open-source models in production. Supported use cases include text generation (LLMs), audio transcription (Whisper), image generation, embedding, voice synthesis, and text-to-speech (TTS) applications.
The Baseten infrastructure is designed to support multi-region, multi-cloud, and multi-cluster deployments. The system is compatible with GPU models such as NVIDIA A100, H100, H200, GH200, and L4, and features automatic horizontal scaling to support thousands of replicas as needed. It is engineered for 99.999% availability, equating to approximately five and a half minutes of downtime per year.
Baseten complies with international data protection and security standards, including HIPAA, SOC 2 Type II, and GDPR. The system does not retain user data, and all model inputs and outputs are fully controlled by the user.
Baseten uses a pay-per-minute pricing model based on compute time consumption. The platform offers three service tiers: Basic, Pro, and Enterprise. Clients include companies such as Descript, Patreon, Rime, and Bland AI. Reported inference cost savings range between 40% and 65%.

Henüz Tartışma Girilmemiştir
"Baseten" maddesi için tartışma başlatın
Founding and Funding
Technological Infrastructure and Partnerships
Products and Services
Models and Use Cases
Infrastructure and Scalability
Compliance and Security
Financial Structure and Client Base
Bu madde yapay zeka desteği ile üretilmiştir.