G

Groq API

by Groq Free tier

Groq runs open-source LLMs at speeds previously impossible: 800+ tokens per second on their custom Language Processing Unit (LPU) hardware. The API is OpenAI-compatible, so any code targeting OpenAI's chat completions endpoint works with a one-line change. Models include Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B, and Gemma 2. Ideal for latency-critical applications like voice assistants, real-time chat, and interactive coding tools.

llamamixtralfast-inferenceopen-sourceopenai-compatiblelpu

Quick Reference

Base URL https://api.groq.com/openai/v1 Auth type Bearer Token Auth header Authorization: Bearer gsk_... Rate limit 30 requests/min (free) · Higher on pay-as-you-go Pricing Pay per use Free quota Free tier with rate limits; no credit card required Documentation https://console.groq.com/docs Endpoint status Server online — HTTP 404 — server is online but path returned an error (may require auth)871ms (checked Mar 29, 2026) Builder score C 58% builder-friendly
Pricing
75
Latency
38
Depth
61

Authentication

Generate an API key at console.groq.com. Pass it as a Bearer token in the Authorization header. The endpoint is OpenAI-compatible.

Authorization: Bearer gsk_...

Pricing

Model pay-as-you-go Starting price Pay per use Free quota Free tier with rate limits; no credit card required Unit cost $0.5900 per 1,000,000 input tokens (Llama 3.3 70B)
PlanPrice/moIncluded
Free Free Rate-limited access to all models
PAYG Free Pay per token, higher rate limits

Llama 3.1 8B Instant: $0.05/M input, $0.08/M output. Llama 3.3 70B Versatile: $0.59/M input, $0.79/M output. Llama 3.1 70B: $0.59/M input, $0.79/M output. Batch API: 50% off.

Key Endpoints

MethodPathDescription
POST /chat/completions Chat completions (OpenAI-compatible)
GET /models List available models and their context lengths

Sample Request

curl "https://api.groq.com/openai/v1/chat/completions" \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"Explain quantum computing in one paragraph."}]}'

Sample Response

{
  "id": "chatcmpl-abc123",
  "model": "llama-3.3-70b-versatile",
  "choices": [{
    "message": { "role": "assistant", "content": "Quantum computing harnesses..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 18, "completion_tokens": 87, "total_tokens": 105 },
  "x_groq": { "id": "req_abc", "usage": { "queue_time": 0.0002 } }
}

Data sourced from API Map. Always verify pricing and rate limits against the official Groq documentation.