Get started in seconds with per token pricing, zero setup and no cold starts
Customize open models with your own data with minimal setup
Pay per GPU second for faster speeds, higher rate limits, and lower costs at scale
Base model | $ / 1M tokens |
---|---|
Less than 4B parameters | $0.10 |
4B - 16B parameters | $0.20 |
More than 16B parameters | $0.90 |
MoE 0B - 56B parameters (e.g. Mixtral 8x7B) | $0.50 |
MoE 56.1B - 176B parameters (e.g. DBRX, Mixtral 8x22B) | $1.20 |
DeepSeek V3 | $0.90 |
DeepSeek R1 (Fast) | $3.00 input, $8.00 output |
DeepSeek R1 0528 (Fast) | $3.00 input, $8.00 output |
DeepSeek R1 (Basic) | $0.55 input, $2.19 output |
Meta Llama 3.1 405B | $3.00 |
Meta Llama 4 Maverick (Basic) | $0.22 input, $0.88 output |
Meta Llama 4 Scout (Basic) | $0.15 input, $0.60 output |
Qwen3 235B | $0.22 input, $0.88 output |
Qwen3 30B | $0.15 input, $0.60 output |
Pay per second of audio input
Model | $ / audio minute (billed per second) |
---|---|
Whisper-v3-large | $0.0015 |
Whisper-v3-large-turbo | $0.0009 |
Streaming transcription service | $0.0032 |
Pay per number of inference steps (denoising iterations)
Image model name | $ / step | Image model with ControlNet, $ / step |
---|---|---|
All Non-Flux Models (SDXL, Playground, etc) | $0.00013 ($0.0039 per 30 step image) | $0.0002 ($0.006 per 30 step image) |
FLUX.1 [dev] | $0.0005 ($0.014 per 28 step image) | N/A on serverless |
FLUX.1 [schnell] | $0.00035 ($0.0014 per 4 step image) | N/A on serverless |
Base model parameter count | $ / 1M input tokens |
---|---|
up to 150M | $0.008 |
150M - 350M | $0.016 |
Base Model | $ / 1M training tokens |
---|---|
Models up to 16B parameters | $0.50 |
Models 16.1B - 80B | $3.00 |
DeepSeek R1 / V3 | $10.00 |
There is no additional cost for having LoRA fine-tunes upto the quota for an account.
GPU Type | $ / hour (billed per second) |
---|---|
A100 80 GB GPU | $2.90 |
H100 80 GB GPU | $5.80 |
H200 141 GB GPU | $6.99 |
B200 180 GB GPU | $11.99 |
AMD MI300X | $4.99 |
For estimates of per-token prices, see this blog. Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines.