Pricing to seamlessly scale from idea to enterprise

Start building in seconds, self-serve. Contact us for enterprise deployments with faster speeds, lower costs, and higher rate limits.

Get started Contact Us

Serverless Inference

Get started in seconds with per token pricing, zero setup and no cold starts

See Pricing

Fine Tuning

Customize open models with your own data with minimal setup

See Pricing

On Demand Deployments

Pay per GPU second for faster speeds, higher rate limits, and lower costs at scale

See Pricing

Serverless Pricing

Pay per token, with high rate limits and postpaid billing. Get started with $1 in free credits!

Text and Vision

Base model	$ / 1M tokens
Less than 4B parameters	$0.10
4B - 16B parameters	$0.20
More than 16B parameters	$0.90
MoE 0B - 56B parameters (e.g. Mixtral 8x7B)	$0.50
MoE 56.1B - 176B parameters (e.g. DBRX, Mixtral 8x22B)	$1.20
DeepSeek V3	$0.90
DeepSeek R1 (Fast)	$3.00 input, $8.00 output
DeepSeek R1 0528 (Fast)	$3.00 input, $8.00 output
DeepSeek R1 (Basic)	$0.55 input, $2.19 output
Meta Llama 3.1 405B	$3.00
Meta Llama 4 Maverick (Basic)	$0.22 input, $0.88 output
Meta Llama 4 Scout (Basic)	$0.15 input, $0.60 output
Qwen3 235B	$0.22 input, $0.88 output
Qwen3 30B	$0.15 input, $0.60 output

Speech to Text (STT)

Pay per second of audio input

Model	$ / audio minute (billed per second)
Whisper-v3-large	$0.0015
Whisper-v3-large-turbo	$0.0009
Streaming transcription service	$0.0032

Diarization adds a 40% surcharge to pricing
Batch API prices are reduced 40%

Image Generation

Pay per number of inference steps (denoising iterations)

Image model name	$ / step	Image model with ControlNet, $ / step
All Non-Flux Models (SDXL, Playground, etc)	$0.00013 ($0.0039 per 30 step image)	$0.0002 ($0.006 per 30 step image)
FLUX.1 [dev]	$0.0005 ($0.014 per 28 step image)	N/A on serverless
FLUX.1 [schnell]	$0.00035 ($0.0014 per 4 step image)	N/A on serverless

Embeddings

Base model parameter count	$ / 1M input tokens
up to 150M	$0.008
150M - 350M	$0.016

Fine Tuning Pricing

Pay per training token. Inference for fine-tuned models costs the same as the base models

Fine Tuning

Base Model	$ / 1M training tokens
Models up to 16B parameters	$0.50
Models 16.1B - 80B	$3.00
DeepSeek R1 / V3	$10.00

There is no additional cost for having LoRA fine-tunes upto the quota for an account.

On-Demand Pricing

Pay per GPU second, with no extra charges for start-up times

On demand deployments

GPU Type	$ / hour (billed per second)
A100 80 GB GPU	$2.90
H100 80 GB GPU	$5.80
H200 141 GB GPU	$6.99
B200 180 GB GPU	$11.99
AMD MI300X	$4.99

For estimates of per-token prices, see this blog. Results vary by use case, but we often observe improvements like ~250% higher throughput and 50% faster speed on Fireworks compared to open source inference engines.

Pricing to seamlessly scale from idea to enterprise

Serverless Inference

Fine Tuning

On Demand Deployments

Serverless Pricing

Text and Vision

Speech to Text (STT)

Image Generation

Embeddings

Fine Tuning Pricing

Fine Tuning

On-Demand Pricing

On demand deployments

Pages

Company

Legal

Connect

Platform