Context cliff at ~75K tokens — smart zone before, dumb zone after

rule Fri Jul 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) rulesagentcontexttokenstuning

Context cliff at ~75K tokens

LLM attention rots with prefix size. Two zones. Cross at own peril.

Zones

Zone	Prefix size	Behaviour
Smart	< ~75K tokens	Fresh attention. High fidelity. Instructions honoured. Edits precise.
Dumb	> ~75K tokens	Attention smeared. Instruction drift. Silent forgetting. Regressions.

Dex Horthy (HumanLayer) pins cliff at 75K (source). Matt Pocock rounds to 100K (aihero.dev, workshop 2026-07). Same phenomenon.

Why the cliff

Attention scales quadratic with tokens. N tokens → N² attention edges.
Every new token dilutes signal-to-noise ratio.
1M-context models still cliff at ~100K for coding — extra room is retrieval-good, generation-bad.

Token count = first-class state

Check own token estimate before decisions. Not optional. Rules:

Estimate	Action
< 50K	Continue freely
50-75K	Wrap current task. No new subthreads.
75-100K	`/clear` at next natural break
> 100K	Dumb zone. Delegate remainder or `/clear` now

/clear beats /compact

Op	Cost	Effect
`/clear`	Full reset	Back to system prompt. Zero sediment. Smart zone.
`/compact`	Lossy summary	Summary accumulates as new prefix sediment. Not fresh.

Prefer /clear at task boundaries. Reserve /compact for mid-task-can't-abandon scenarios.

Subagent rule

Starting subagent for review / verify / audit / research → fresh context. Never inherit parent's bloated prefix. Fresh subagent = free smart-zone reset.

See [[delegate-to-subagents-by-default]] for delegation triggers.

Anti-patterns

❌ Grinding past 100K "because we're almost done"
❌ /compact mid-task to squeeze in more work
❌ Reading 20 files serially in main thread instead of dispatching researcher
❌ Ignoring token estimate because "it still seems to be working"

Cross-refs

[[delegate-to-subagents-by-default]] — subagents = fresh context = smart zone reset
[[review-in-fresh-context]] — companion rule (this file's twin)
[[minimum-everything]] — smallest-unit-per-task limits token bloat