State of OSS Maintainability · 2026 git-archaeologist v1.9.2 June 9, 2026

More contributors
doesn't mean
safer code

We analyzed 26 of the most widely used open-source repositories — 438,904 files, hundreds of thousands of commits — looking for modules only one person truly understands.

26/26

Every repository analyzed had at least one module where a single contributor owns the majority of commit history — regardless of team size. Projects with 6,483 contributors. Projects with 5,358. Still had them.

26 repositories
analyzed
438,904 source files
scanned
100% had ≥1 BF-1
module
6,483 Rails contributors —
still 7 BF-1 modules
25 Spring Boot BF-1
modules (worst)
1 Vite BF-1
modules (best)
Context

Why ownership
concentration matters

Bus factor measures how many contributors a project can lose before a given module becomes unmaintainable. A bus factor of 1 means a single person leaving effectively orphans that code.

Onboarding

When a critical module is effectively owned by one person, new contributors can't ramp up without that person's involvement. Reviews bottleneck. Knowledge doesn't transfer.

Contributor departure

When that contributor leaves — changes job, loses interest, or becomes unavailable — maintenance stops or shifts entirely to whoever is willing to learn it from scratch.

Scale doesn't help

The assumption is that large, popular projects are safe from this. This dataset suggests otherwise. Thousands of contributors do not appear to prevent ownership concentration in individual modules.

Key Findings

What the data shows

Four findings emerge directly from the dataset — none require extrapolation.

Finding 01

The "more contributors" assumption doesn't hold

Every project — from Rails (6,483 contributors) to Remix (29) — had at least one module where a single author owns more than 50% of the change history. No clear relationship between contributor count and BF-1 module count was observed in this dataset.

Finding 02

Spring Boot showed the highest BF-1 count in this dataset

25 bus factor 1 modules with only 140 authors. Spring Boot's architecture is highly modular — 22,528 files across dozens of sub-modules, many authored and maintained by a small number of contributors. This number held across three separate analysis runs. Raw data →

Finding 03

Vite is the positive outlier

236 authors, only 1 bus factor 1 module — the healthiest ratio in the dataset by a significant margin. This shows the pattern is not inevitable. Deliberate code ownership practices produce measurably different outcomes.

Finding 04

Scale provides no measurable protection

VSCode (3,050 contributors → 15 BF-1 modules), Kubernetes (5,358 → 9), Rails (6,483 → 7). Larger teams do not consistently produce lower bus factor counts in this dataset.

Visualization

BF-1 modules per project

All 26 repositories sorted by bus factor 1 count descending. BF-1 = single author owns >50% of recorded changes in the analyzed window.

Bus Factor 1 Modules — all 26 repositories
≥ 20
10–19
3–9
1–2
Notable Cases

Worst, best, and most cursed

Highest BF-1 count
Spring Boot
25 BF-1 modules · 140 authors · 22,528 files

Worst bus factor ratio in the dataset. One BF-1 module per ~900 files, across a project with massive downstream enterprise Java usage.

Lowest BF-1 (large project)
Vite
1 BF-1 module · 236 authors · 2,844 files

The only large project that comes close to breaking the pattern. Proves the outcome is not structural — it responds to deliberate ownership practices.

Highest curse score
VSCode
extHost.protocol.ts · score 108,065 · 1,762 changes · 98 authors

The most-changed file in the entire dataset. 98 authors have touched it, yet its change frequency, recency, and author chaos combine for the highest curse score of any file analyzed.

Most contributors, still concentrated
Rails
6,483 authors · 7 BF-1 modules · 86,549 commits

The project with the most contributors in the dataset still has 7 BF-1 modules, including the routing mapper — one of the most business-critical files in any Rails app.

Cursed Files

The files most at risk

The curse score combines change frequency, recency, author count, age decay, and acceleration. Config, documentation, generated files, and vendored code are excluded — these are real source files. Full formula in the Methodology section.

File Project Authors Changes Curse Score
src/vs/workbench/api/common/extHost.protocol.ts
VSCode 98 1,762 108,065
pkg/kubelet/kubelet.go
Kubernetes 434 1,817 107,471
actionpack/lib/action_dispatch/routing/mapper.rb
Rails 43,909
src/server.h
Redis 10,886
packages/runtime-core/src/renderer.ts
Vue Core 3,784
lib/nest-application.ts
NestJS 3,784
ssa/opGen.go (partially generated)
Go 3,545
coordinator/group/GroupMetadataManager.java
Kafka 2,123
Full Dataset

All 26 repositories

Click any column header to sort. BF-1 = modules where one author owns >50% of changes. BF-2 = modules requiring at most two authors. = analyzed within tool's default commit window.

Project Commits Files Authors BF-1 BF-2 Top Cursed File Curse Score

Commits and authors reflect the analysis window, not full repo lifetime. See Limitations.

Methodology

How this was measured

All analysis used git-archaeologist v1.9.2 (git-arch analyze <repo> --json) against local clones. Commit SHAs at time of analysis are in research/README.md.

Bus factor definition

A module is BF-1 when a single contributor accounts for more than 50% of all non-merge commits touching that file. Conservative threshold — measures observable change authorship, not theoretical knowledge.

Curse score formula

Implemented in src/analyzers/curseScorer.ts:

curseScore = totalChanges × log₂(authors+1) × exp(−0.5 × ageInYears) × log₂(churn+2) × acceleration × 100 totalChanges = non-merge commits touching the file authors = unique commit authors on that file ageInYears = time since first commit (decay — old stable files score lower) churn = change frequency derived from commit timeline acceleration = ratio of recent change rate vs historical rate (worsening files rank higher)

Exponential decay means old chaos that has stabilized scores lower. Acceleration means recently worsening files rank higher than files with similar totals that have levelled off.

Worked example (VSCode extHost.protocol.ts): 1,762 changes × log₂(99) × exp(−0.5 × ~3.2) × log₂(churn+2) × acceleration × 100 ≈ 108,065. Values are rounded.

What counts as a commit?

All non-merge commits. Merge commits are excluded to avoid double-counting changes from feature branches.

Are bots excluded?

Yes. Commits from accounts matching bot patterns — dependabot, renovate, github-actions, and similar — are excluded from authorship calculations.

How are file renames handled?

Via git log --follow. A file's full authorship history is preserved across renames throughout the repository's lifetime.

What files are excluded?

Generated files, config, docs, lock files, vendored code, and build scripts. Only source files are ranked. The filter is best-effort — edge cases may exist.

Causation or correlation?

Correlation only. This data shows ownership concentration co-exists with large contributor bases. It does not explain why, nor predict failures.

What was excluded entirely?

Rust (168k files — exceeded memory limits during analysis). All 26 other repositories completed successfully.

Noise filter patterns applied

Limitations

What this data does
and doesn't show

Documenting limitations is not a hedge — it is the difference between a credible analysis and a marketing document.

⚠ Most important

Analysis window varies by repo

Several repos show round commit counts (1,000; 2,000) — these reflect the tool's analysis window, not full repository history. React has ~20,000 real commits; TensorFlow ~170,000. BF-1 findings are valid within the window. Full history re-runs are planned for v2.0.

Authorship ≠ knowledge

Commit authorship is a proxy

BF-1 means one author owns the majority of recorded commits. It does not measure who has reviewed the code, read it, or could maintain it. Objective and reproducible, but imperfect.

Snapshot in time

Results reflect June 9, 2026

All 26 repos analyzed at specific commit SHAs (see research/README.md). Results will differ after significant contributor activity. Concentration can improve — Vite proves this.

Identity

Authors identified by commit signature

One person using two email addresses counts as two authors. Co-authored commits are attributed to the first listed author. Both factors could slightly undercount real unique contributors.

Scope

Source files only

Generated, vendored, config, docs, and lock files are excluded from cursed file rankings and BF-1 counts where possible. The filter is best-effort — some edge cases may have slipped through.

Interpretation

Correlation, not causation

BF-1 modules co-exist with large contributor bases across every project analyzed. This data does not explain why, nor predict failures. A BF-1 module is a risk indicator, not a guarantee of problems.

On round commit numbers: repos showing exactly 1,000 or 2,000 commits were analyzed within git-archaeologist's default window. The full repository has more history. Bus factor and curse score findings are valid within the analyzed window — but commits and authors figures reflect the window, not project lifetime. Rows marked in the data table.

Reproducibility

Raw data & reproduction

Per-repository JSON outputs are published alongside this report. Each file contains the full analysis: cursed files array, bus factor breakdown, file statistics, and the commit SHA at time of analysis.

npm install -g git-archaeologist git-arch analyze /path/to/repo --json

Repository commit SHAs are in research/README.md. All 26 JSON files are available individually.

About git-archaeologist

An open-source CLI for analyzing code ownership and complexity patterns in git repositories. This report benchmarks the tool against 26 major OSS projects.