This article was automatically translated from the original Turkish version.

Baseten is an infrastructure software platform that enables the deployment, serving, and scaling of machine learning (ML – Machine Learning / Makine Öğrenimi) models in production environments. Founded in 2019 in San Francisco, California, the company aims to help organizations developing AI applications run their models quickly, reliably, and cost-effectively. With its platform centered on the model inference – çıkarım – process, Baseten focuses on resolving performance bottlenecks at this stage. Its customer portfolio includes AI-focused companies such as Writer, Descript, Abridge, and Gamma.
Baseten was founded in 2019 by Tuhin Srivastava, Amir Haghighat, Philip Howes, and Pankaj Gupta. The company has grown by developing software that optimizes AI model inference. As of 2025, it has more than 60 employees and raised $75 million in a Series C funding round co-led by Spark Capital and IVP in the same year. The company has received a total of $135 million in investment and is valued at $850 million.
Baseten runs its cloud-based model serving infrastructure on Amazon Web Services (AWS – Amazon Web Hizmetleri). Utilizing services such as Amazon Elastic Compute Cloud (EC2 – Elastik Hesaplama Bulutu) and Amazon Elastic Kubernetes Service (EKS – Elastik Kubernetes Servisi), Baseten also maintains close collaboration with NVIDIA. By integrating NVIDIA’s TensorRT-LLM (TensorRT for Large Language Models / Büyük Dil Modelleri için TensorRT) and Triton Inference Server solutions, it enhances inference speed and efficiency. As part of the NVIDIA Inception program, Baseten gained early access to TensorRT-LLM technology and delivered to its customers an average two-fold increase in inference throughput and up to 50% reduction in time to first token (TTFT – Time to First Token / İlk Jeton Üretim Süresi).
The Baseten platform supports the deployment, serving, monitoring, and management of AI models. Its core components are:
Truss: An open-source model packaging library that supports frameworks such as PyTorch, TensorFlow, HuggingFace Transformers, TensorRT, and Triton. It enables the deployment of Python-based models along with their dependencies into production environments.
Chains: A software development kit (SDK – Software Development Kit / Yazılım Geliştirme Kiti) that supports complex AI workflows by enabling the creation of multi-step model chains.
Inference Engine: This architecture supports synchronous – senkron, asynchronous – asenkron, and streaming inference and operates with advanced techniques such as speculative decoding – tahmine dayalı çözümleme.
Observability – Gözlemlenebilirlik: System performance can be monitored in real time through built-in tools and integrated with external observability platforms such as Datadog and Prometheus.
Baseten provides a model library that enables users to integrate their own models or pre-trained open-source models into production environments. This library includes models across diverse domains such as text generation (LLM – Large Language Model / Büyük Dil Modeli), speech transcription (Whisper), image generation, text embedding – gömme, audio generation, and text-to-speech (TTS – Text to Speech / Metinden Konuşmaya).
Baseten’s infrastructure is designed to be multi-region, multi-cloud, and multi-cluster capable. The system supports GPU models such as NVIDIA A100, H100, H200, GH200, and L4 and features automatic horizontal scaling – otomatik ölçekleme – enabling the creation of thousands of replicas as needed. The system is configured to achieve 99.999% availability, with an annual total service downtime target of approximately five and a half minutes.
Baseten is compliant with international security and data protection standards including HIPAA – Health Insurance Portability and Accountability Act / Sağlık Bilgi Taşınabilirliği ve Sorumluluk Yasası, SOC 2 Type II – System and Organization Controls / Sistem ve Organizasyon Kontrolleri, and GDPR – General Data Protection Regulation / Genel Veri Koruma Tüzüğü. The system does not store user data; model inputs and outputs remain entirely under user control.
Baseten’s pricing model is based on pay-per-minute – dakika başı ücretlendirme – charging customers for compute time used. The platform offers a three-tier service package: Basic, Pro, and Enterprise. Customers include organizations such as Descript, Patreon, Rime, and Bland AI, with inference cost savings ranging between 40% and 65%.

Founding and Funding
Technology Infrastructure and Partnerships
Products and Services
Models and Application Areas
Infrastructure and Scalability
Compliance and Security
Financial Structure and Customer Base