# HuggingFace Inference API

**Provider:** HuggingFace
**Category:** ai
**Base URL:** `https://api-inference.huggingface.co/models`
**Auth:** bearer — `Authorization: Bearer hf_...`
**Rate Limit:** Varies by model and plan (free tier is rate-limited)
**Free Tier:** Yes — [object Object]
**Pricing:** Free tier available (freemium)
**Docs:** https://huggingface.co/docs/api-inference

## Description

The HuggingFace Inference API gives serverless access to the entire HuggingFace Hub: 200,000+ models for text generation, text classification, summarization, translation, question answering, image generation, image classification, object detection, speech recognition, and audio synthesis. Models range from tiny to frontier-scale. The Inference Endpoints product offers dedicated deployment for production workloads.

## Endpoints

| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `https://api-inference.huggingface.co/models/{model_id}` | Run any model with task-specific input/output |
| POST | `https://api-inference.huggingface.co/models/openai/v1/chat/completions` | OpenAI-compatible chat completions (TGI models) |

## Authentication

Create a User Access Token at huggingface.co/settings/tokens (read scope is sufficient). Pass it as a Bearer token in the Authorization header.

```
Authorization: Bearer hf_...
```

## Sample Request

```bash
# Text generation with Llama 3
curl "https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct" \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"inputs": "What is the capital of France?", "parameters": {"max_new_tokens": 50}}'
```

## Sample Response

```json
[{
  "generated_text": "What is the capital of France?
The capital of France is Paris."
}]
```

## Pricing Details

Free: serverless inference on public models (rate-limited). PRO: $9/mo. Dedicated Inference Endpoints from $0.06/hr. Enterprise: custom.

---

*Source: [API Map](https://apimap.dev/apis/huggingface/) — CC BY 4.0*