logologo
Ai badge logo

This article was created with the support of artificial intelligence.

ArticleDiscussion

Baseten

fav gif
Save
viki star outline
jMh6UfQVMb63qQbQryVurf9XBN1XdkTs.webp
Baseten
Founded
2019
Founders
Tuhin Srivastava Amir Haghighat Philip Howes Pankaj Gupt
Location
San FranciscoCaliforniaUSA
Website
https://www.baseten.co/

Baseten is an infrastructure software platform that enables the deployment, serving, and scaling of machine learning (ML) models in production environments. Founded in 2019 in San Francisco, California, the company aims to help organizations developing artificial intelligence applications run their models quickly, reliably, and cost-effectively. Centered around the model inference process, Baseten focuses on addressing performance bottlenecks in this critical stage. Its client portfolio includes AI-focused companies such as Writer, Descript, Abridge, and Gamma.

Founding and Funding

Baseten was founded in 2019 by Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta. The company has grown by developing software that optimizes AI model inference. As of 2025, Baseten employs over 60 people. In the same year, it secured $75 million in a Series C funding round co-led by Spark Capital and IVP. The company has raised a total of $135 million and reached a valuation of $850 million.

Technological Infrastructure and Partnerships

Baseten operates its cloud-based model serving infrastructure on Amazon Web Services (AWS), utilizing services such as Amazon EC2 (Elastic Compute Cloud) and Amazon EKS (Elastic Kubernetes Service). It also maintains a close partnership with NVIDIA, integrating NVIDIA’s TensorRT-LLM (TensorRT for Large Language Models) and Triton Inference Server solutions to improve inference latency and efficiency. Through the NVIDIA Inception program, Baseten gained early access to TensorRT-LLM technology and has delivered an average of 2x improvement in inference efficiency and up to 50% reduction in time to first token (TTFT) for its customers.

Products and Services

Baseten's platform supports deployment, serving, monitoring, and management of AI models. Key components include:

  • Truss: An open-source model packaging library that supports frameworks such as PyTorch, TensorFlow, HuggingFace Transformers, TensorRT, and Triton. It enables Python-based models to be transferred into production environments along with their dependencies.
  • Chains: A software development kit (SDK) for building complex AI workflows, allowing users to create multi-step model chains.
  • Inference Engine: Supports synchronous, asynchronous, and streaming inference. It includes advanced techniques such as speculative decoding.
  • Observability: Real-time monitoring tools enable tracking of system performance and integrate with third-party observability platforms like Datadog and Prometheus.

Models and Use Cases

Baseten provides a model library that allows users to deploy their own models or pre-trained open-source models in production. Supported use cases include text generation (LLMs), audio transcription (Whisper), image generation, embedding, voice synthesis, and text-to-speech (TTS) applications.

Infrastructure and Scalability

The Baseten infrastructure is designed to support multi-region, multi-cloud, and multi-cluster deployments. The system is compatible with GPU models such as NVIDIA A100, H100, H200, GH200, and L4, and features automatic horizontal scaling to support thousands of replicas as needed. It is engineered for 99.999% availability, equating to approximately five and a half minutes of downtime per year.

Compliance and Security

Baseten complies with international data protection and security standards, including HIPAA, SOC 2 Type II, and GDPR. The system does not retain user data, and all model inputs and outputs are fully controlled by the user.

Financial Structure and Client Base

Baseten uses a pay-per-minute pricing model based on compute time consumption. The platform offers three service tiers: Basic, Pro, and Enterprise. Clients include companies such as Descript, Patreon, Rime, and Bland AI. Reported inference cost savings range between 40% and 65%.

Bibliographies

Amazon Web Services. “AWS Case Study: Baseten and NVIDIA.” Amazon Web Services. Accessed May 2, 2025. https://aws.amazon.com/partners/success/baseten-nvidia/

Baseten. “Baseten AI Inference Platform Pricing.” Baseten. Accessed May 2, 2025. https://www.baseten.co/pricing/

Baseten. “Baseten Customers.” Baseten. Accessed May 2, 2025. https://www.baseten.co/customers/

Baseten. “Baseten Library.” Baseten. Accessed May 2, 2025. https://www.baseten.co/library/

Baseten. “Compound AI Solutions.” Baseten. Accessed May 2, 2025. https://www.baseten.co/solutions/compound-ai/

Baseten. “Concepts: How Baseten Works.” Baseten Documentation. Accessed May 2, 2025. https://docs.baseten.co/concepts/howbasetenworks

Baseten. “Embedded Engineering.” Baseten. Accessed May 2, 2025. https://www.baseten.co/platform/embedded-engineering/

Baseten. “Embeddings Solutions.” Baseten. Accessed May 2, 2025. https://www.baseten.co/solutions/embeddings/

Baseten. “Image Generation Solutions.” Baseten. Accessed May 2, 2025. https://www.baseten.co/solutions/image-generation/

Baseten. “LLM (Large Language Model) Solutions.” Baseten. Accessed May 2, 2025. https://www.baseten.co/solutions/llms/

Baseten. “Model Management.” Baseten. Accessed May 2, 2025. https://www.baseten.co/platform/model-management/

Baseten. “Model Performance.” Baseten. Accessed May 2, 2025. https://www.baseten.co/platform/model-performance/

Baseten. “Platform: Cloud-Native Infrastructure.” Baseten. Accessed May 2, 2025. https://www.baseten.co/platform/cloud-native-infrastructure/

Baseten. “Text-to-Speech Solutions.” Baseten. Accessed May 2, 2025. https://www.baseten.co/solutions/text-to-speech/

Baseten. “Transcription Solutions.” Baseten. Accessed May 2, 2025. https://www.baseten.co/solutions/transcription/

Business Wire. “Baseten Lands $75M from IVP and Spark to Solve AI’s Biggest Bottleneck to Ubiquitous Adoption: Inference.” Business Wire. Accessed May 2, 2025. https://www.businesswire.com/news/home/20250218263765/en/Baseten-Lands-%2475M-from-IVP-and-Spark-to-Solve-AIs-Biggest-Bottleneck-to-Ubiquitous-Adoption-Inference

CNBC. “AI Inference Startup Baseten Raises $75 Million.” CNBC. Accessed May 2, 2025. https://www.cnbc.com/2025/02/19/ai-inference-startup-baseten-raises-75-million.html

Forbes. “Forbes Profile: Baseten.” Forbes. Accessed May 2, 2025. https://www.forbes.com/companies/baseten/?list=ai50

NVIDIA. “Baseten Case Study.” NVIDIA. Accessed May 2, 2025. https://www.nvidia.com/en-us/case-studies/baseten/

You Can Rate Too!

0 Ratings

Author Information

Avatar
Main AuthorÖmer Said AydınMay 4, 2025 at 10:50 AM
Ask to Küre