Together AI is a San Francisco-based AI cloud provider that offers infrastructure and software solutions for training, fine-tuning, and deploying open-source large language models (LLMs) in production environments. Founded in 2022, the company is known for its research-driven engineering approach and contributions to the open-source AI ecosystem, particularly through its RedPajama initiative—a large-scale open dataset project.
Founding and Management
Together AI was founded in 2022 by Vipul Ved Prakash (CEO), Ce Zhang (CTO), Chris Ré, Percy Liang, and Tri Dao. The founding team includes AI researchers affiliated with Stanford University and Hazy Research. The company's headquarters is located in San Francisco, California.
Technological Infrastructure
Together AI’s infrastructure is designed to support high-performance training, inference, and fine-tuning of LLMs. Its core technology stack is structured around three main components: the Together Inference Engine, the Together Fine-Tuning Framework, and Together GPU Clusters.
Together Inference Engine
This component provides a high-efficiency inference engine optimized for both open-source and proprietary models in production use. Key technical features include:
- Transformer-optimized kernels: Custom FP8 (float 8-bit) kernels deliver up to 75% faster inference compared to standard frameworks like PyTorch.
- QTIP (Quantization with Integrity Preservation): Enables low-precision computation while maintaining model accuracy.
- Speculative Decoding: Enhances inference performance using draft models trained on the RedPajama dataset.
- Model Variants: Each model is available in three formats—“Lite” (lowest cost), “Turbo” (balanced), and “Reference” (full accuracy).
- API Support: Serverless, OpenAI-compatible APIs and dedicated endpoints for GPU-specific deployments.
Together Fine-Tuning Infrastructure
Together AI’s fine-tuning infrastructure allows organizations to retrain models with their own data. Capabilities include:
- LoRA (Low-Rank Adaptation) for efficient customization.
- Full Fine-Tuning of all model parameters.
- DPO (Direct Preference Optimization) and Continued Fine-Tuning for preference-based and iterative model updates.
- Support for long contexts up to 32K tokens.
- JSONL input and CLI support for automation and integration into development workflows.
Together GPU Clusters
These high-performance clusters are tailored for model training and inference. Key hardware and network components include:
- GPU Options: NVIDIA A100, H100, H200, B200, and GB200 (Grace Blackwell architecture), with up to 384GB HBM3e memory.
- Networking:
- NVLink for direct GPU communication
- InfiniBand (3200 Gbps) for low-latency distributed processing
- Software Stack:
- Together Kernel Collection (CUDA-based)
- Slurm and Kubernetes for workload management
- Training up to 24% faster and inference up to 75% faster than PyTorch
- Reliability: 99.9% SLA with redundant infrastructure and expert technical support
RedPajama Dataset and Models
Together AI developed RedPajama, a 30-trillion-token open dataset (RedPajama-Data-v2), which ranks among the largest publicly available LLM datasets. RedPajama models built on this dataset are used by over 500 open-source AI projects and are intended to support reproducible research and open-access AI development.
Research and Innovation
The company actively contributes to AI research through innovations such as:
- FlashAttention-3: A low-latency attention mechanism
- Cocktail SGD: A method that reduces network load by up to 117× during distributed training
- QTIP: Techniques for quantized, high-fidelity inference
- Sub-quadratic architectures: Including models like Striped Hyena and Monarch Mixer
Clients and Use Cases
Together AI supports a wide range of models, including text, code, image, audio, embeddings, rerankers, and multimodal systems. Organizations using its infrastructure include Salesforce, The Washington Post, Pika Labs, Arcee AI, Nexusflow, and Wordware. Application areas include:
- Customer support automation
- Video content generation
- Cybersecurity modeling
- AI-driven in-game characters
- Text-to-speech solutions
- Enterprise document analysis
Pricing
Together AI offers three pricing tiers:
- Build: Pay-as-you-go access to fast, serverless inference.
- Scale: Reserved GPUs, custom configurations, and Slack-based technical support.
- Enterprise: VPC deployment, 99.9% SLA, geo-redundancy, and dedicated support teams.
Future Outlook
Together AI’s vision is to bring open-source AI technologies into enterprise production with fast, cost-effective, and controllable models. The company aims to lead in core algorithm research (e.g., FlashAttention) while making large-scale model deployment more accessible. It plans to continue advancing infrastructure efficiency and model reliability, and to expand the capabilities of its platform through innovations like self-optimizing training pipelines and customizable inference agents.