H

HuggingFace Inference API

by HuggingFace Free tier

The HuggingFace Inference API gives serverless access to the entire HuggingFace Hub: 200,000+ models for text generation, text classification, summarization, translation, question answering, image generation, image classification, object detection, speech recognition, and audio synthesis. Models range from tiny to frontier-scale. The Inference Endpoints product offers dedicated deployment for production workloads.

open-sourcetransformersnlpcomputer-visionaudiodiffusionbertllama

Quick Reference

Base URL https://api-inference.huggingface.co/models Auth type Bearer Token Auth header Authorization: Bearer hf_... Rate limit Varies by model and plan (free tier is rate-limited) Pricing Free tier available Free quota Rate-limited free inference on public models Documentation https://huggingface.co/docs/api-inference Endpoint status Server online — HTTP 410 — server is online but path returned an error (may require auth)926ms (checked Mar 29, 2026) Builder score B 65% builder-friendly
Pricing
90
Latency
38
Depth
66

Authentication

Create a User Access Token at huggingface.co/settings/tokens (read scope is sufficient). Pass it as a Bearer token in the Authorization header.

Authorization: Bearer hf_...

Pricing

Model freemium Starting price Free tier available Free quota Rate-limited free inference on public models
PlanPrice/moIncluded
Free Free Rate-limited serverless inference
PRO $9 $9/mo, higher limits, ZeroGPU
Endpoints $6 Dedicated endpoints from $0.06/hr
Enterprise $0 Custom pricing

Free: serverless inference on public models (rate-limited). PRO: $9/mo. Dedicated Inference Endpoints from $0.06/hr. Enterprise: custom.

Key Endpoints

MethodPathDescription
POST /{model_id} Run any model with task-specific input/output
POST /openai/v1/chat/completions OpenAI-compatible chat completions (TGI models)

Sample Request

# Text generation with Llama 3
curl "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct" \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"inputs": "What is the capital of France?", "parameters": {"max_new_tokens": 50}}'

Sample Response

[{
  "generated_text": "What is the capital of France?
The capital of France is Paris."
}]

Data sourced from API Map. Always verify pricing and rate limits against the official HuggingFace documentation.