Overview
Up-to-date comparison of LLM chat API pricing, features, and capabilities across all major providers. All prices are in USD per 1 million tokens.
Pricing & Feature Comparison
Last updated: 2026-03-10
| Provider | Model | Input $/1M | Output $/1M | Context | Max Output | Free Tier | Vision | Function Calling | Streaming | JSON Mode | OpenAI Compatible |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 | 128k | 4096 | No | Yes | Yes | Yes | Yes | Yes |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | 128k | 16384 | No | Yes | Yes | Yes | Yes | Yes |
| OpenAI | o1 | $15.00 | $60.00 | 128k | 32768 | No | Yes | Yes | No | Yes | Yes |
| OpenAI | o3-mini | $1.10 | $4.40 | 200k | 102400 | No | Yes | Yes | No | Yes | Yes |
| Anthropic | Claude 3.5 Sonnet | $3.00 | $15.00 | 200k | 8192 | No | Yes | Yes | Yes | Yes | No |
| Anthropic | Claude 3 Opus | $15.00 | $75.00 | 200k | 8192 | No | Yes | Yes | Yes | Yes | No |
| Anthropic | Claude 4 Sonnet | $2.50 | $12.50 | 1M | 204800 | No | Yes | Yes | Yes | Yes | No |
| Anthropic | Claude 4 Opus | $12.00 | $60.00 | 2M | 409600 | No | Yes | Yes | Yes | Yes | No |
| Gemini 2.0 Flash | $0.075 | $0.30 | 1M | 8192 | Free (1M/month) | Yes | Yes | Yes | Yes | No | |
| Gemini 2.0 Pro | $1.25 | $5.00 | 2M | 8192 | Free (150k/month) | Yes | Yes | Yes | Yes | No | |
| DeepSeek | DeepSeek V3 | $0.14 | $0.28 | 128k | 8192 | Free ($1 credit) | No | Yes | Yes | Yes | Yes |
| DeepSeek | DeepSeek R1 | $1.00 | $2.00 | 128k | 8192 | Free ($1 credit) | No | Yes | Yes | Yes | Yes |
| Alibaba | Qwen-2.5 72B | $0.10 | $0.30 | 128k | 8192 | Free (unlimited) | No | Yes | Yes | Yes | Yes |
| Alibaba | Qwen-Max | $0.50 | $2.00 | 32k | 8192 | Free (1M tokens) | Yes | Yes | Yes | Yes | Yes |
| Mistral | Mistral Large 2 | $2.00 | $6.00 | 128k | 8192 | No | No | Yes | Yes | Yes | Yes |
| Mistral | Codestral 2501 | $0.30 | $0.90 | 256k | 32768 | No | No | Yes | Yes | Yes | Yes |
| xAI | Grok-2 | $2.00 | $10.00 | 128k | 8192 | No | Yes | Yes | Yes | Yes | Yes |
Latency Reference (TTFT)
Time to First Token (TTFT) benchmarks based on average response times for 1000 token input prompts:
| Provider | Model | Average TTFT | 95th Percentile |
|---|---|---|---|
| OpenAI | GPT-4o-mini | 200-300ms | 500ms |
| OpenAI | GPT-4o | 300-500ms | 800ms |
| Anthropic | Claude 3.5 Sonnet | 400-600ms | 900ms |
| Gemini 2.0 Flash | 150-250ms | 400ms | |
| DeepSeek | DeepSeek V3 | 250-400ms | 600ms |
| Alibaba | Qwen-2.5 72B | 200-350ms | 550ms |
| Mistral | Mistral Large 2 | 300-500ms | 700ms |
| xAI | Grok-2 | 350-600ms | 850ms |
Code Examples
cURL
bash
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
bash
curl https://api.anthropic.com/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
bash
curl -X POST https://generativelanguage.googleapis.com/v1/models/gemini-2.0-flash:generateContent \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts":[{"text": "Hello!"}]}]
}'?key=$GOOGLE_API_KEY
bash
curl https://api.deepseek.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPSEEK_API_KEY" \
-d '{
"model": "deepseek-chat",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Python SDK
python
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
python
from anthropic import Anthropic
client = Anthropic(api_key="your-api-key")
response = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.content[0].text)
python
import google.generativeai as genai
genai.configure(api_key="your-api-key")
model = genai.GenerativeModel("gemini-2.0-flash")
response = model.generate_content("Hello!")
print(response.text)
python
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.deepseek.com/v1"
)
response = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)