Before you ship an AI feature, know what it costs to run. Enter your tokens per request and your volume and this estimates the monthly and annual API bill — including the input-versus-output split that catches most teams out.
Rates are illustrative — enter your provider’s live token price for an exact figure.
What this free tool is great for: a quick, one-off job with no signup — it runs entirely in your browser, so nothing leaves your device and there's nothing to manage.
Its honest limit: it estimates a single scenario from rates you enter — it won't track your real usage, watch the bill climb as traffic grows, or help you actually build and optimise the feature behind it.
An AI feature feels free while you're prototyping — a few cents here and there. Then it ships, real traffic hits it, and the invoice arrives. The jump isn't a mystery once you see the maths: your cost is simply tokens times price times volume, and all three quietly scale together as you grow.
Only three things move your bill: the tokens you send (your prompt, context and any retrieved documents), the tokens the model generates (the answer), and how many requests you run. Fatten any one — a bigger system prompt, a chattier model, more users — and the cost rises in lockstep. The calculator above lets you flex each so you can see which one is actually driving your number.
Here's the part most teams miss: providers typically charge several times more for output tokens than for input tokens. A long, rambling answer costs more than a long, detailed question. That's why the tool shows what share of each request is output — if it's most of your cost, the highest-leverage fix isn't a cheaper model, it's a shorter, more constrained answer.
Cap the output. Set a max length and ask for concise answers; you lose less quality than you'd expect. Right-size the model. Reserve the frontier model for the hard 20% and route the rest to a smaller, cheaper one. Cache and retrieve. Don't re-prompt for things you already computed; store results and pull in context instead of stuffing everything into every call. Trim the prompt. Long system prompts and bloated context get paid for on every single request. Batch where you can. Many providers price batched or off-peak work lower.
Knowing the unit economics before you ship spares you the nasty surprise later. Once the numbers work, the actual build — chaining prompts, adding a knowledge base, turning it into an agent — is a separate job. A platform like Dify handles that plumbing, so you're not wiring orchestration, retrieval and deployment together by hand.
It's exact for the rates and volume you enter — the maths is simply tokens times price. The one variable is the token price, which changes and differs per provider, so the model presets are illustrative and every rate field is editable. Enter your provider's live price for a precise figure.
Providers charge more for tokens the model generates (output) than for tokens you send (input), often several times more. That's why this tool splits the two and shows what share of your cost is output — usually the bigger lever.
No. The calculation runs entirely in your browser; nothing is uploaded, stored or logged, and there's no signup.
This tool is free and runs entirely in your browser. The link above is an affiliate link: we may earn a commission if you sign up, at no extra cost to you, and it never changes our honest take.