A token is the smallest unit of text that a large language model processes. It is not a word. Depending on the model, a token might be a whole word, a syllable, a single character, or a fragment like 'ing' or 'pre'. The average English word breaks down into roughly 1.3 tokens, though technical vocabulary and non-English languages tend to use more.
Tokens directly determine what you pay and what you can do with any LLM-based tool. Every API call, every ChatGPT prompt, every AI-generated blog post is metered in tokens, both input and output. If you are building AI into your marketing workflows, understanding token economics is the difference between a cost-effective content operation and a budget that spirals without warning. Token limits also cap how much context a model can hold at once, which affects the quality of everything from summarisation to long-form generation.
When you send text to an LLM, a tokeniser splits it into tokens before the model processes anything. The model then predicts the next token in a sequence, one at a time, until it finishes its response. Both your input (the prompt, any context you provide) and the model's output count against the context window, which is the maximum number of tokens the model can handle in a single interaction. GPT-4o's context window is 128,000 tokens; Claude's can reach 200,000. Once you exceed the window, the model either truncates or loses earlier context entirely.
The most common mistake is treating tokens and words as interchangeable, then being surprised when costs or context limits hit earlier than expected. A 4,000-word brief does not use 4,000 tokens; it uses closer to 5,200. Another frequent error is stuffing enormous system prompts into every API call without realising that those tokens are billed repeatedly on every single request. We also see marketers ignoring output tokens entirely when budgeting, despite the fact that output tokens are typically two to four times more expensive than input tokens on most APIs.
Straight answers to the things marketers and business owners actually want to know about tokens.
Most LLM providers offer a tokeniser tool. OpenAI has one at platform.openai.com/tokenizer. You paste in your text and it shows exactly how the model splits it. For rough estimation, divide your word count by 0.75 to get an approximate token count in English.
Generating text requires far more computation than reading it. The model has to predict each token sequentially, running the full neural network for every single one. Reading your input can be parallelised. That computational difference is reflected in the pricing, and it is significant enough that optimising for shorter, more precise outputs is one of the easiest ways to reduce AI costs.
The model cannot process more tokens than its context window allows. In most API implementations, you will get an error. In chat interfaces, the model silently drops earlier messages from the conversation. This means it literally forgets what you told it at the start. For long marketing briefs or multi-step workflows, this is a real problem that requires deliberate prompt architecture.
In multimodal models, yes. Images are converted into token equivalents based on their resolution. A high-resolution image sent to GPT-4o can cost over 1,000 tokens. If you are running visual analysis or creative review through an LLM at scale, image token costs add up quickly and are easy to overlook.
We build AI workflows with token efficiency baked in from the start. That means right-sizing prompts, choosing the appropriate model tier for each task rather than defaulting to the most expensive one, and structuring context so you are not paying to re-send the same information on every call. It is one part of how we transfer genuine AI capability to your team, so you can run these systems independently and profitably.