Repo code-size ceiling grounded in 2026 web practice
Repo code-size ceiling grounded 2026-07-03
Locked
- WARN at 200K tokens per own repo (practitioner median).
- FAIL at 2M tokens per own repo (hard ceiling).
- Forks exempt.
- Enforcement: Dagger TS module at
dagger/+ GHArepo-size-audit.ymlcalls Dagger.
Sources (12 cited)
Deep-research subagent aggregated 2026-dated sources. Full JSON at prior task output. Key claims:
| Source | Date | Credibility | Claim |
|---|---|---|---|
| Anthropic Claude Code docs | 2026-06-26 | high | 200K default; compaction is lossy |
| Anthropic 1M context blog | 2026-04-15 | high | 1M GA March 2026; "context rot" degrades before fill |
| GitHub Copilot docs | 2026-05-13 | high | No hard limit; large-repo ~60s indexing |
| Hookflow (Cursor practice) | 2026-05-21 | medium | 10K file cap; ~20K active-context tokens |
| AfterBuild Labs | 2026-04-15 | medium | Practical threshold 10K-20K LOC |
| AI Rankings | 2026-04-15 | medium | 1M ≈ 75K LOC; 15% fewer compactions post-rollout |
| Developers Digest | 2026-05-26 | medium | Retrieval > raw window; 200K-line dumps miss structure |
| Zencoder | 2026-05-18 | medium | 64K window + repo-graph beat larger window: 71%→84% accuracy, 5× cost cut |
| Karpathy autoresearch | 2026-03-07 | high | 630-LOC repo intentionally fits single context, "readable in an afternoon" |
| Jin (Medium) | 2026-03-25 | medium | Softmax attention degrades past 150K tokens; start context <20% of window |
| Verdent (dissent) | 2026-04-01 | low | 1M = mid-monorepo + docs fits, big-is-fine |
| AIForCode (dissent) | 2026-01-25 | low | 10K-file repo = 50M tokens; 200K window holds 3% of medium enterprise |
Consensus range: 64K–500K tokens. Median: 200K.
Why two tiers not one
Single threshold rushes splits. Two tiers give leadtime:
- 200K = one repo, one agent, one turn. Below WARN.
- 200K–2M = still workable with retrieval / repo-graph (Zencoder pattern). Above WARN.
- >2M = exceeds even generous 1M windows; quality visibly rots. Above FAIL.
Why 2M as hard ceiling
- Matches largest reasonable Sonnet-1M-with-slack scenario
- Matches user's stated preference ("2M context is good")
- Sonnet 4.6's 1M/2M context is technical max — 2M is aggressive but not fantasy
bookmark-mind-bs-ext decision
363K tokens — above WARN. User picked "make umbrella repo no split" — treat as intentional umbrella. Adds umbrella-repo tag to signal grandfathered. Extension + tests + docs stay one repo.
Dagger not Node
Per pipeline-stack-2026-07-01 — Dagger TS is locked. Dagger module at dagger/src/index.ts is the canonical enforcement path. Node script (scripts/audit-repo-code-tokens.mjs) kept as no-daemon fallback.
Cross-refs
repo-code-size-ceiling— the rulepipeline-stack-2026-07-01— Dagger lockedatomic-packages-lazy— extraction pattern when splitting