Context cliff at ~75K tokens — smart zone before, dumb zone after
Context cliff at ~75K tokens
LLM attention rots with prefix size. Two zones. Cross at own peril.
Zones
| Zone | Prefix size | Behaviour |
|---|---|---|
| Smart | < ~75K tokens | Fresh attention. High fidelity. Instructions honoured. Edits precise. |
| Dumb | > ~75K tokens | Attention smeared. Instruction drift. Silent forgetting. Regressions. |
Dex Horthy (HumanLayer) pins cliff at 75K (source). Matt Pocock rounds to 100K (aihero.dev, workshop 2026-07). Same phenomenon.
Why the cliff
- Attention scales quadratic with tokens. N tokens → N² attention edges.
- Every new token dilutes signal-to-noise ratio.
- 1M-context models still cliff at ~100K for coding — extra room is retrieval-good, generation-bad.
Token count = first-class state
Check own token estimate before decisions. Not optional. Rules:
| Estimate | Action |
|---|---|
| < 50K | Continue freely |
| 50-75K | Wrap current task. No new subthreads. |
| 75-100K | /clear at next natural break |
| > 100K | Dumb zone. Delegate remainder or /clear now |
/clear beats /compact
| Op | Cost | Effect |
|---|---|---|
/clear |
Full reset | Back to system prompt. Zero sediment. Smart zone. |
/compact |
Lossy summary | Summary accumulates as new prefix sediment. Not fresh. |
Prefer /clear at task boundaries. Reserve /compact for mid-task-can't-abandon scenarios.
Subagent rule
Starting subagent for review / verify / audit / research → fresh context. Never inherit parent's bloated prefix. Fresh subagent = free smart-zone reset.
See [[delegate-to-subagents-by-default]] for delegation triggers.
Anti-patterns
- ❌ Grinding past 100K "because we're almost done"
- ❌
/compactmid-task to squeeze in more work - ❌ Reading 20 files serially in main thread instead of dispatching researcher
- ❌ Ignoring token estimate because "it still seems to be working"
Cross-refs
- [[delegate-to-subagents-by-default]] — subagents = fresh context = smart zone reset
- [[review-in-fresh-context]] — companion rule (this file's twin)
- [[minimum-everything]] — smallest-unit-per-task limits token bloat