The HuggingFace Inference API gives serverless access to the entire HuggingFace Hub: 200,000+ models for text generation, text classification, summarization, translation, question answering, image generation, image classification, object detection, speech recognition, and audio synthesis. Models range from tiny to frontier-scale. The Inference Endpoints product offers dedicated deployment for production workloads.
https://api-inference.huggingface.co/models
Auth type
Bearer Token
Auth header
Authorization: Bearer hf_...
Rate limit
Varies by model and plan (free tier is rate-limited)
Pricing
Free tier available
Free quota
Rate-limited free inference on public models
Documentation
https://huggingface.co/docs/api-inference
Endpoint status
Server online — HTTP 410 — server is online but path returned an error (may require auth)926ms
(checked Mar 29, 2026)
Builder score
B
65%
builder-friendly
Create a User Access Token at huggingface.co/settings/tokens (read scope is sufficient). Pass it as a Bearer token in the Authorization header.
Authorization: Bearer hf_...
| Plan | Price/mo | Included |
|---|---|---|
| Free | Free | Rate-limited serverless inference |
| PRO | $9 | $9/mo, higher limits, ZeroGPU |
| Endpoints | $6 | Dedicated endpoints from $0.06/hr |
| Enterprise | $0 | Custom pricing |
Free: serverless inference on public models (rate-limited). PRO: $9/mo. Dedicated Inference Endpoints from $0.06/hr. Enterprise: custom.
| Method | Path | Description |
|---|---|---|
| POST | /{model_id} |
Run any model with task-specific input/output |
| POST | /openai/v1/chat/completions |
OpenAI-compatible chat completions (TGI models) |
# Text generation with Llama 3
curl "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct" \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{"inputs": "What is the capital of France?", "parameters": {"max_new_tokens": 50}}'
[{
"generated_text": "What is the capital of France?
The capital of France is Paris."
}]
Data sourced from API Map. Always verify pricing and rate limits against the official HuggingFace documentation.