Paste a prompt or document and see its estimated token count — and how much of each common context window it fills. The number your AI bill and context limits actually run on.
Rule-of-thumb estimate (≈4 characters or ≈¾ word per token for English). Exact counts vary per model's tokenizer — expect ±15%, more for code and non-English text. Counted entirely in your browser.
What this free tool is great for: a quick, one-off job with no signup — it runs entirely in your browser, so nothing leaves your device and there's nothing to manage.
Its honest limit: it's a one-off calculation in your browser — it doesn't save your scenarios, update as your real numbers change, or connect to your live accounts, so you re-enter the figures every time and can't watch how they move.
Every large language model reads and writes in tokens — chunks of text that are usually smaller than a word and larger than a character. Your API bill is priced per token. Your context window is measured in tokens. Your prompt gets truncated at a token limit, not a word limit. Yet almost everyone reasons in words, which is why AI costs and cut-off prompts keep surprising people. This counter gives you the missing number: paste any text and see its estimated token count, plus how much of the common context-window sizes it consumes — counted entirely in your browser, with an honest label on the word "estimated."
Tokenizers split text into pieces drawn from a fixed vocabulary learned from massive corpora. Common words ("the", "and", "founder") are usually single tokens; rarer words get split into sub-pieces ("tokenization" might become "token" + "ization"); punctuation, spaces and capitalisation all influence the split. That's why token counts feel unintuitive: "ChatGPT is great" and "Antidisestablishmentarianism!" have wildly different token-per-word ratios. For ordinary English prose the averages are stable enough to estimate — roughly four characters or three-quarters of a word per token — and those two rules of thumb, blended, are what this tool uses.
Exact token counts require running the specific model's tokenizer, and every family (GPT, Claude, Llama, Gemini) tokenizes slightly differently — the same paragraph can differ by ten percent between them. For planning purposes that rarely matters: budgeting, context-fit checks and cost estimates all survive ±15% noise. But know when the estimate drifts further: code tokenizes heavier than prose (symbols and indentation fragment badly), non-English languages often cost dramatically more tokens per word — some scripts multiply counts several-fold — and text stuffed with URLs, IDs or JSON skews high. If you're doing precision work at scale, use the provider's own tokenizer; for everything else, this estimate is the right tool.
A model's context window is the total tokens it can hold at once — your system prompt, the conversation history, the retrieved documents, the user's question and the answer it writes, all sharing one budget. The fit-bars above show your text against common window tiers, from small 8k models to the million-token giants. The practical insight most people miss: the window fills faster than you think, because everything counts. A 200-page document, a long chat history, a bloated system prompt — they all crowd out room for the answer. When a model "forgets" the start of a long conversation, it didn't forget; the early tokens fell out of the window.
Because pricing is per token — with output tokens typically costing several times input tokens — token counts translate directly to unit economics. A system prompt of 2,000 tokens sounds harmless until it rides along on every one of a million requests. The high-leverage moves, in order: trim the system prompt (it's paid on every call); summarise or truncate conversation history instead of resending it whole; retrieve only the relevant chunks of documents rather than stuffing entire files; and cap output length, since output is the expensive direction. Count before you ship — our AI Cost Calculator turns the token counts from this tool into an actual monthly bill projection.
Token awareness isn't just an engineering concern. If you write prompts for a living — or paste long documents into chat assistants — the counter answers everyday questions: will this whole report fit, how much can I paste before the model loses the beginning, why did my carefully crafted instruction get cut. A useful habit for long tasks: check the token count of your material first, and if it brushes against the window you're using, split by sections with a running summary instead of trusting one giant paste. The models handle a well-structured 30k tokens far better than a truncated 130k.
Paste the same text into three token counters and you may get three numbers. That's not sloppiness — they're either using different tokenizers (each accurate for its own model) or different estimation rules. This tool deliberately shows a blended estimate rather than impersonating any specific model's tokenizer, and says so — an honest ~1,050 beats a false-precision 1,047 that's only true for one model family. Treat cross-tool differences under fifteen percent as agreement. When someone quotes you an exact token count without naming the tokenizer, they've told you less than this estimate does.
The teams that keep AI costs boring all converge on the same loop. First, count the fixed overhead: paste your system prompt and any always-included examples here and treat that number as the per-request tax. Second, estimate the variable part — a typical user message plus retrieved context — and set the output cap deliberately rather than accepting the default. Third, multiply through your expected volume in our AI Cost Calculator to see the monthly bill before a line of code ships. Finally, re-count whenever the prompt changes: prompts only ever grow, a phenomenon veterans call prompt bloat, and a monthly five-minute audit of what's actually riding along on every request routinely finds a third of the tokens doing nothing. Counting is the cheapest optimisation in AI — it just has to actually happen.
A counter tells you what a prompt costs and whether it fits; the harder work is building the thing around the prompt — the chains, the knowledge base with retrieval, the agent logic, the deployment, the observability that shows token spend per user in production. That's where Dify does more: an open LLM-ops platform where you visually build AI apps and agents, plug in your documents, and monitor exactly the token flows you estimated here — turning prompt-craft into shipped product without wiring the plumbing yourself. Count and budget here; when the prompt needs to become an app, build it on rails instead of glue code.
It blends the two standard rules of thumb (~4 characters and ~¾ word per token for English) — expect ±15% versus any specific model's tokenizer, more for code and non-English text. For exact counts, use the provider's own tokenizer.
Each model family (GPT, Claude, Llama, Gemini) uses its own tokenizer with a different vocabulary, so the same text splits differently — often by around ten percent between families.
The total tokens a model can hold at once: system prompt, history, documents, question and answer combined. The fit-bars show your text against common window tiers from 8k to 1M tokens.
No — counting happens locally in JavaScript. Nothing you paste is uploaded, logged or stored.
Blogger, teacher or toolmaker? Put this calculator on your own page — free forever, no strings. Copy the snippet below (the credit link is appreciated and keeps the tool free):
This tool is free and runs entirely in your browser. The link above is an affiliate link: we may earn a commission if you sign up, at no extra cost to you, and it never changes our honest take.