Top-K Sampling
A text generation technique that limits token selection to the K most probable next tokens, balancing creativity and coherence.
Definition
Top-K sampling restricts the language model's next-token selection to the K highest probability options. With K=50, only the top 50 most likely tokens are considered, with selection randomized among them based on probability.
Lower K values produce more predictable, focused outputs. Higher K values allow more creative, varied responses. K=1 is equivalent to greedy decoding (always choosing the most likely token).
Why It Matters
Top-K balances creativity and coherence. Unrestricted sampling might choose improbable tokens creating nonsense. Too restrictive sampling produces repetitive, predictable text.
Different applications need different K values. Code generation benefits from low K (precision matters). Creative writing benefits from higher K (variety matters). Understanding this enables better prompt engineering.
Examples in Practice
GPT models default to Top-K=40, allowing diversity while filtering out very low probability tokens that might derail coherent generation.
A developer adjusts K from 50 to 10 when generating code, reducing creative variation in favor of more predictable, syntactically correct outputs.