We analyzed 26 of the most widely used open-source repositories — 438,904 files, hundreds of thousands of commits — looking for modules only one person truly understands.
Every repository analyzed had at least one module where a single contributor owns the majority of commit history — regardless of team size. Projects with 6,483 contributors. Projects with 5,358. Still had them.
Bus factor measures how many contributors a project can lose before a given module becomes unmaintainable. A bus factor of 1 means a single person leaving effectively orphans that code.
When a critical module is effectively owned by one person, new contributors can't ramp up without that person's involvement. Reviews bottleneck. Knowledge doesn't transfer.
When that contributor leaves — changes job, loses interest, or becomes unavailable — maintenance stops or shifts entirely to whoever is willing to learn it from scratch.
The assumption is that large, popular projects are safe from this. This dataset suggests otherwise. Thousands of contributors do not appear to prevent ownership concentration in individual modules.
Four findings emerge directly from the dataset — none require extrapolation.
Every project — from Rails (6,483 contributors) to Remix (29) — had at least one module where a single author owns more than 50% of the change history. No clear relationship between contributor count and BF-1 module count was observed in this dataset.
25 bus factor 1 modules with only 140 authors. Spring Boot's architecture is highly modular — 22,528 files across dozens of sub-modules, many authored and maintained by a small number of contributors. This number held across three separate analysis runs. Raw data →
236 authors, only 1 bus factor 1 module — the healthiest ratio in the dataset by a significant margin. This shows the pattern is not inevitable. Deliberate code ownership practices produce measurably different outcomes.
VSCode (3,050 contributors → 15 BF-1 modules), Kubernetes (5,358 → 9), Rails (6,483 → 7). Larger teams do not consistently produce lower bus factor counts in this dataset.
All 26 repositories sorted by bus factor 1 count descending. BF-1 = single author owns >50% of recorded changes in the analyzed window.
Worst bus factor ratio in the dataset. One BF-1 module per ~900 files, across a project with massive downstream enterprise Java usage.
The only large project that comes close to breaking the pattern. Proves the outcome is not structural — it responds to deliberate ownership practices.
The most-changed file in the entire dataset. 98 authors have touched it, yet its change frequency, recency, and author chaos combine for the highest curse score of any file analyzed.
The project with the most contributors in the dataset still has 7 BF-1 modules, including the routing mapper — one of the most business-critical files in any Rails app.
The curse score combines change frequency, recency, author count, age decay, and acceleration. Config, documentation, generated files, and vendored code are excluded — these are real source files. Full formula in the Methodology section.
| File | Project | Authors | Changes | Curse Score |
|---|---|---|---|---|
src/vs/workbench/api/common/extHost.protocol.ts |
VSCode | 98 | 1,762 | 108,065 |
pkg/kubelet/kubelet.go |
Kubernetes | 434 | 1,817 | 107,471 |
actionpack/lib/action_dispatch/routing/mapper.rb |
Rails | — | — | 43,909 |
src/server.h |
Redis | — | — | 10,886 |
packages/runtime-core/src/renderer.ts |
Vue Core | — | — | 3,784 |
lib/nest-application.ts |
NestJS | — | — | 3,784 |
ssa/opGen.go (partially generated) |
Go | — | — | 3,545 |
coordinator/group/GroupMetadataManager.java |
Kafka | — | — | 2,123 |
Click any column header to sort. BF-1 = modules where one author owns >50% of changes. BF-2 = modules requiring at most two authors. † = analyzed within tool's default commit window.
| Project ↕ | Commits ↕ | Files ↕ | Authors ↕ | BF-1 ↕ | BF-2 ↕ | Top Cursed File ↕ | Curse Score ↕ |
|---|
† Commits and authors reflect the analysis window, not full repo lifetime. See Limitations.
All analysis used git-archaeologist v1.9.2 (git-arch analyze <repo> --json) against local clones. Commit SHAs at time of analysis are in research/README.md.
A module is BF-1 when a single contributor accounts for more than 50% of all non-merge commits touching that file. Conservative threshold — measures observable change authorship, not theoretical knowledge.
Implemented in src/analyzers/curseScorer.ts:
Exponential decay means old chaos that has stabilized scores lower. Acceleration means recently worsening files rank higher than files with similar totals that have levelled off.
Worked example (VSCode extHost.protocol.ts): 1,762 changes × log₂(99) × exp(−0.5 × ~3.2) × log₂(churn+2) × acceleration × 100 ≈ 108,065. Values are rounded.
All non-merge commits. Merge commits are excluded to avoid double-counting changes from feature branches.
Yes. Commits from accounts matching bot patterns — dependabot, renovate, github-actions, and similar — are excluded from authorship calculations.
Via git log --follow. A file's full authorship history is preserved across renames throughout the repository's lifetime.
Generated files, config, docs, lock files, vendored code, and build scripts. Only source files are ranked. The filter is best-effort — edge cases may exist.
Correlation only. This data shows ownership concentration co-exists with large contributor bases. It does not explain why, nor predict failures.
Rust (168k files — exceeded memory limits during analysis). All 26 other repositories completed successfully.
Documenting limitations is not a hedge — it is the difference between a credible analysis and a marketing document.
Several repos show round commit counts (1,000; 2,000) — these reflect the tool's analysis window, not full repository history. React has ~20,000 real commits; TensorFlow ~170,000. BF-1 findings are valid within the window. Full history re-runs are planned for v2.0.
BF-1 means one author owns the majority of recorded commits. It does not measure who has reviewed the code, read it, or could maintain it. Objective and reproducible, but imperfect.
All 26 repos analyzed at specific commit SHAs (see research/README.md). Results will differ after significant contributor activity. Concentration can improve — Vite proves this.
One person using two email addresses counts as two authors. Co-authored commits are attributed to the first listed author. Both factors could slightly undercount real unique contributors.
Generated, vendored, config, docs, and lock files are excluded from cursed file rankings and BF-1 counts where possible. The filter is best-effort — some edge cases may have slipped through.
BF-1 modules co-exist with large contributor bases across every project analyzed. This data does not explain why, nor predict failures. A BF-1 module is a risk indicator, not a guarantee of problems.
On round commit numbers: repos showing exactly 1,000 or 2,000 commits were analyzed within git-archaeologist's default window. The full repository has more history. Bus factor and curse score findings are valid within the analyzed window — but commits and authors figures reflect the window, not project lifetime. Rows marked † in the data table.
Per-repository JSON outputs are published alongside this report. Each file contains the full analysis: cursed files array, bus factor breakdown, file statistics, and the commit SHA at time of analysis.
Repository commit SHAs are in research/README.md. All 26 JSON files are available individually.
An open-source CLI for analyzing code ownership and complexity patterns in git repositories. This report benchmarks the tool against 26 major OSS projects.