Together AI is a San Francisco-based AI cloud provider that offers infrastructure and software solutions for training, fine-tuning, and deploying open-source large language models (LLMs) in production environments. Founded in 2022, the company is known for its research-driven engineering approach and contributions to the open-source AI ecosystem, particularly through its RedPajama initiative—a large-scale open dataset project.

Founding and Management

Together AI was founded in 2022 by Vipul Ved Prakash (CEO), Ce Zhang (CTO), Chris Ré, Percy Liang, and Tri Dao. The founding team includes AI researchers affiliated with Stanford University and Hazy Research. The company's headquarters is located in San Francisco, California.

Technological Infrastructure

Together AI’s infrastructure is designed to support high-performance training, inference, and fine-tuning of LLMs. Its core technology stack is structured around three main components: the Together Inference Engine, the Together Fine-Tuning Framework, and Together GPU Clusters.

Together Inference Engine

This component provides a high-efficiency inference engine optimized for both open-source and proprietary models in production use. Key technical features include:

Transformer-optimized kernels: Custom FP8 (float 8-bit) kernels deliver up to 75% faster inference compared to standard frameworks like PyTorch.
QTIP (Quantization with Integrity Preservation): Enables low-precision computation while maintaining model accuracy.
Speculative Decoding: Enhances inference performance using draft models trained on the RedPajama dataset.
Model Variants: Each model is available in three formats—“Lite” (lowest cost), “Turbo” (balanced), and “Reference” (full accuracy).
API Support: Serverless, OpenAI-compatible APIs and dedicated endpoints for GPU-specific deployments.

Together Fine-Tuning Infrastructure

Together AI’s fine-tuning infrastructure allows organizations to retrain models with their own data. Capabilities include:

LoRA (Low-Rank Adaptation) for efficient customization.
Full Fine-Tuning of all model parameters.
DPO (Direct Preference Optimization) and Continued Fine-Tuning for preference-based and iterative model updates.
Support for long contexts up to 32K tokens.
JSONL input and CLI support for automation and integration into development workflows.

Together GPU Clusters

These high-performance clusters are tailored for model training and inference. Key hardware and network components include:

GPU Options: NVIDIA A100, H100, H200, B200, and GB200 (Grace Blackwell architecture), with up to 384GB HBM3e memory.
Networking:
NVLink for direct GPU communication
InfiniBand (3200 Gbps) for low-latency distributed processing
Software Stack:
Together Kernel Collection (CUDA-based)
Slurm and Kubernetes for workload management
Training up to 24% faster and inference up to 75% faster than PyTorch
Reliability: 99.9% SLA with redundant infrastructure and expert technical support

RedPajama Dataset and Models

Together AI developed RedPajama, a 30-trillion-token open dataset (RedPajama-Data-v2), which ranks among the largest publicly available LLM datasets. RedPajama models built on this dataset are used by over 500 open-source AI projects and are intended to support reproducible research and open-access AI development.

Research and Innovation

The company actively contributes to AI research through innovations such as:

FlashAttention-3: A low-latency attention mechanism
Cocktail SGD: A method that reduces network load by up to 117× during distributed training
QTIP: Techniques for quantized, high-fidelity inference
Sub-quadratic architectures: Including models like Striped Hyena and Monarch Mixer

Clients and Use Cases

Together AI supports a wide range of models, including text, code, image, audio, embeddings, rerankers, and multimodal systems. Organizations using its infrastructure include Salesforce, The Washington Post, Pika Labs, Arcee AI, Nexusflow, and Wordware. Application areas include:

Customer support automation
Video content generation
Cybersecurity modeling
AI-driven in-game characters
Text-to-speech solutions
Enterprise document analysis

Pricing

Together AI offers three pricing tiers:

Build: Pay-as-you-go access to fast, serverless inference.
Scale: Reserved GPUs, custom configurations, and Slack-based technical support.
Enterprise: VPC deployment, 99.9% SLA, geo-redundancy, and dedicated support teams.

Future Outlook

Together AI’s vision is to bring open-source AI technologies into enterprise production with fast, cost-effective, and controllable models. The company aims to lead in core algorithm research (e.g., FlashAttention) while making large-scale model deployment more accessible. It plans to continue advancing infrastructure efficiency and model reliability, and to expand the capabilities of its platform through innovations like self-optimizing training pipelines and customizable inference agents.

Together AI