# HuggingFace Inference API **Provider:** HuggingFace **Category:** ai **Base URL:** `https://api-inference.huggingface.co/models` **Auth:** bearer — `Authorization: Bearer hf_...` **Rate Limit:** Varies by model and plan (free tier is rate-limited) **Free Tier:** Yes — [object Object] **Pricing:** Free tier available (freemium) **Docs:** https://huggingface.co/docs/api-inference ## Description The HuggingFace Inference API gives serverless access to the entire HuggingFace Hub: 200,000+ models for text generation, text classification, summarization, translation, question answering, image generation, image classification, object detection, speech recognition, and audio synthesis. Models range from tiny to frontier-scale. The Inference Endpoints product offers dedicated deployment for production workloads. ## Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | POST | `https://api-inference.huggingface.co/models/{model_id}` | Run any model with task-specific input/output | | POST | `https://api-inference.huggingface.co/models/openai/v1/chat/completions` | OpenAI-compatible chat completions (TGI models) | ## Authentication Create a User Access Token at huggingface.co/settings/tokens (read scope is sufficient). Pass it as a Bearer token in the Authorization header. ``` Authorization: Bearer hf_... ``` ## Sample Request ```bash # Text generation with Llama 3 curl "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct" \ -H "Authorization: Bearer $HF_TOKEN" \ -H "Content-Type: application/json" \ -d '{"inputs": "What is the capital of France?", "parameters": {"max_new_tokens": 50}}' ``` ## Sample Response ```json [{ "generated_text": "What is the capital of France? The capital of France is Paris." }] ``` ## Pricing Details Free: serverless inference on public models (rate-limited). PRO: $9/mo. Dedicated Inference Endpoints from $0.06/hr. Enterprise: custom. --- *Source: [API Map](https://apimap.dev/apis/huggingface/) — CC BY 4.0*