Context Window

The maximum number of tokens (input + output) a model can handle in a single inference. A 128K context model can process roughly 96K words at once. Longer context = ability to read whole books, codebases, or long conversations, but memory usage grows quadratically with context length. InternLM 3 has the longest open-model context at 200K.