Groq runs open-source LLMs at speeds previously impossible: 800+ tokens per second on their custom Language Processing Unit (LPU) hardware. The API is OpenAI-compatible, so any code targeting OpenAI's chat completions endpoint works with a one-line change. Models include Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B, and Gemma 2. Ideal for latency-critical applications like voice assistants, real-time chat, and interactive coding tools.
https://api.groq.com/openai/v1
Auth type
Bearer Token
Auth header
Authorization: Bearer gsk_...
Rate limit
30 requests/min (free) · Higher on pay-as-you-go
Pricing
Pay per use
Free quota
Free tier with rate limits; no credit card required
Documentation
https://console.groq.com/docs
Endpoint status
Server online — HTTP 404 — server is online but path returned an error (may require auth)871ms
(checked Mar 29, 2026)
Builder score
C
58%
builder-friendly
Generate an API key at console.groq.com. Pass it as a Bearer token in the Authorization header. The endpoint is OpenAI-compatible.
Authorization: Bearer gsk_...
| Plan | Price/mo | Included |
|---|---|---|
| Free | Free | Rate-limited access to all models |
| PAYG | Free | Pay per token, higher rate limits |
Llama 3.1 8B Instant: $0.05/M input, $0.08/M output. Llama 3.3 70B Versatile: $0.59/M input, $0.79/M output. Llama 3.1 70B: $0.59/M input, $0.79/M output. Batch API: 50% off.
| Method | Path | Description |
|---|---|---|
| POST | /chat/completions |
Chat completions (OpenAI-compatible) |
| GET | /models |
List available models and their context lengths |
curl "https://api.groq.com/openai/v1/chat/completions" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"Explain quantum computing in one paragraph."}]}'
{
"id": "chatcmpl-abc123",
"model": "llama-3.3-70b-versatile",
"choices": [{
"message": { "role": "assistant", "content": "Quantum computing harnesses..." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 18, "completion_tokens": 87, "total_tokens": 105 },
"x_groq": { "id": "req_abc", "usage": { "queue_time": 0.0002 } }
}
Data sourced from API Map. Always verify pricing and rate limits against the official Groq documentation.