What is the HuggingFace Inference API base URL?

The HuggingFace Inference API base URL is https://api-inference.huggingface.co/models

Does HuggingFace Inference API have a free tier?

Yes. Rate-limited free inference on public models

HuggingFace Inference API — Base URL, Auth & Endpoints

The HuggingFace Inference API gives serverless access to the entire HuggingFace Hub: 200,000+ models for text generation, text classification, summarization, translation, question answering, image generation, image classification, object detection, speech recognition, and audio synthesis. Models range from tiny to frontier-scale. The Inference Endpoints product offers dedicated deployment for production workloads.

Quick Reference

Base URL https://api-inference.huggingface.co/models Auth type Bearer Token Auth header Authorization: Bearer hf_... Rate limit Varies by model and plan (free tier is rate-limited) Pricing Free tier available Free quota Rate-limited free inference on public models Documentation https://huggingface.co/docs/api-inference Endpoint status Server online — HTTP 410 — server is online but path returned an error (may require auth)926ms (checked Mar 29, 2026) Builder score B 65% builder-friendly

Pricing

Latency

Depth

Authentication

Create a User Access Token at huggingface.co/settings/tokens (read scope is sufficient). Pass it as a Bearer token in the Authorization header.

Authorization: Bearer hf_...

Pricing

Model freemium Starting price Free tier available Free quota Rate-limited free inference on public models

Plan	Price/mo	Included
Free	Free	Rate-limited serverless inference
PRO	$9	$9/mo, higher limits, ZeroGPU
Endpoints	$6	Dedicated endpoints from $0.06/hr
Enterprise	$0	Custom pricing

Free: serverless inference on public models (rate-limited). PRO: $9/mo. Dedicated Inference Endpoints from $0.06/hr. Enterprise: custom.

Key Endpoints

Method	Path	Description
POST	`/{model_id}`	Run any model with task-specific input/output
POST	`/openai/v1/chat/completions`	OpenAI-compatible chat completions (TGI models)

Sample Request

# Text generation with Llama 3
curl "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct" \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"inputs": "What is the capital of France?", "parameters": {"max_new_tokens": 50}}'

Sample Response

[{
  "generated_text": "What is the capital of France?
The capital of France is Paris."
}]

Data sourced from API Map. Always verify pricing and rate limits against the official HuggingFace documentation.