Tokens and context windows
Why long chats “forget,” and why to chunk work.
Goal
Understand tokens and the context window so you can plan prompts realistically.
Tokens
Models process tokens (often subwords), not “characters.” Chinese may pack one or more characters per token; English may split words. Paste text into OpenAI’s tokenizer demo to see splits.
Context window
There is a maximum number of tokens the model can attend to at once. Beyond that, early content may be dropped or summarized—classic “it forgot what we said.”
What to do
- For long sources: summarize first, or split into rounds with only the needed excerpt each time.
- Keep critical constraints near the end of the user message (but still prefer a clear spec over tricks).