Token
The basic unit of text that AI models process, typically representing 3-4 characters or about 0.75 words, used for pricing and context limits.
Definition
In AI language models, a token is the fundamental unit of text processing—roughly equivalent to a word fragment, word, or punctuation mark. Models break down input text into tokens before processing, and their responses are generated token-by-token. A token is typically 3-4 characters or about 0.75 words in English. For example, "artificial intelligence" is 3 tokens: "art", "ificial", "intelligence."
Tokens matter because they determine both pricing and technical limitations. AI APIs charge per token—both for input (your prompt) and output (the AI's response). Models also have token limits that define maximum conversation length or context window. GPT-4 with 128K token context can process roughly 96,000 words of text, while older models with 4K limits could only handle about 3,000 words. Token counting affects how you structure prompts and conversations.
Why It Matters
Understanding tokens is essential for managing AI costs and optimizing prompts. When you're paying $0.03 per 1K tokens, verbose prompts become expensive at scale. A 500-word prompt costs about $0.02 per request—negligible for occasional use, but $200 for 10,000 API calls. Efficient prompt design that conveys the same information in fewer tokens directly reduces costs.
Token limits also determine what's possible with AI. Want to analyze a 100-page document? You need a model with sufficient token context to hold that entire document plus your instructions and the response. Understanding token constraints helps you architect AI solutions that work within technical limits—whether that means chunking long documents, summarizing content before analysis, or selecting appropriate models for your use case.
Examples in Practice
A legal firm uses AI to analyze contracts. Their initial prompt plus contract text totals 15,000 tokens, costing $0.45 per analysis with GPT-4. By refining their prompt from 800 tokens to 200 tokens without losing instruction quality, they reduce per-analysis cost to $0.39. Over 10,000 monthly analyses, this optimization saves $600/month.
A developer builds a chatbot for customer service but doesn't account for conversation token accumulation. Each message includes full conversation history, so a 20-message conversation consumes 50,000 tokens ($1.50 cost). They implement conversation summarization that condenses history to 5,000 tokens, reducing cost by 70% while maintaining context quality.
A content writer notices their AI-generated articles are being cut off mid-sentence. They discover they're hitting the model's 4,096 token output limit (approximately 3,000 words). By switching to a model with 8,192 token output limit and adjusting their prompt to specify desired length, they successfully generate complete 2,500-word articles without truncation.