What is the Together AI API base URL?

The Together AI API base URL is https://api.together.xyz/v1

Together AI API — Base URL, Auth & Endpoints

Together AI provides cloud infrastructure for running open-source AI models at scale. The API is fully OpenAI-compatible and supports 200+ models including Llama 3.1 405B, Qwen 2.5, Mistral, DeepSeek, Stable Diffusion, and more. Features include serverless inference (pay per token), dedicated GPU clusters, fine-tuning with LORA, vision models, and function calling. Popular for research, enterprise AI, and teams migrating from proprietary models.

Quick Reference

Base URL https://api.together.xyz/v1 Auth type Bearer Token Auth header Authorization: Bearer YOUR_TOGETHER_API_KEY Rate limit 60 requests/min (default) · Scales with plan Pricing from $0.10/mo Free quota $1 free credit on signup Documentation https://docs.together.ai Endpoint status Server online — HTTP 404 — server is online but path returned an error (may require auth)1.09s (checked Mar 29, 2026) Builder score B 66% builder-friendly

Pricing

Latency

Depth

Authentication

Create an API key at api.together.ai. The endpoint is OpenAI-compatible — pass your key as a Bearer token in the Authorization header.

Authorization: Bearer YOUR_TOGETHER_API_KEY

Pricing

Model pay-as-you-go Starting price from $0.10/mo Free quota $1 free credit on signup

Llama 3.1 8B: $0.18/M · $0.18/M. Llama 3.1 70B: $0.88/M · $0.88/M. Llama 4 Scout: available. DeepSeek R1: $1.25/M · $1.25/M. Image gen: $0.008 per image. No free tier; minimum $5 credit purchase.

Key Endpoints

Method	Path	Description
POST	`/chat/completions`	Chat completions (OpenAI-compatible)
POST	`/completions`	Text completions
POST	`/embeddings`	Generate text embeddings
POST	`/images/generations`	Image generation (Stable Diffusion, FLUX)
GET	`/models`	List all available models with pricing
POST	`/fine-tunes`	Start a fine-tuning job with LoRA

Sample Request

curl "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo","messages":[{"role":"user","content":"Write a Python function to parse JSON safely"}],"max_tokens":200}'

Sample Response

{
  "id": "890ab123",
  "model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
  "choices": [{
    "message": { "role": "assistant", "content": "Here's a Python function to safely parse JSON:

```python
import json

def safe_json_parse(data):
    try:
        return json.loads(data)
    except json.JSONDecodeError:
        return None
```" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 20, "completion_tokens": 68 }
}