Kernel bugs hide for 2 years on average. Some hide for 20

2026-01-082:18294165pebblebed.com

There are bugs in your kernel right now that won't be found for years. I know because I analyzed 125,183 of them, every bug with a traceable Fixes: tag in the Linux kernel's 20-year git history. The…

There are bugs in your kernel right now that won't be found for years. I know because I analyzed 125,183 of them, every bug with a traceable Fixes: tag in the Linux kernel's 20-year git history.

The average kernel bug lives 2.1 years before discovery. But some subsystems are far worse: CAN bus drivers average 4.2 years, SCTP networking 4.0 years. The longest-lived bug in my dataset, a buffer overflow in ethtool, sat in the kernel for 20.7 years. The one which I'll dissect in detail is refcount leak in netfilter, and it lasted 19 years.

I built a tool that catches 92% of historical bugs in a held-out test set at commit time. Here's what I learned.

Key findings at a glance
125,183 Bug-fix pairs with traceable Fixes: tags
123,696 Valid records after filtering (0 < lifetime < 27 years)
2.1 years Average time a bug hides before discovery
20.7 years Longest-lived bug (ethtool buffer overflow)
0% → 69% Bugs found within 1 year (2010 vs 2022)
92.2% Recall of VulnBERT on held-out 2024 test set
1.2% False positive rate (vs 48% for vanilla CodeBERT)

The initial discovery

I started by mining the most recent 10,000 commits with Fixes: tags from the Linux kernel. After filtering out invalid references (commits that pointed to hashes outside the repo, malformed tags, or merge commits), I had 9,876 valid vulnerability records. For the lifetime analysis, I excluded 27 same-day fixes (bugs introduced and fixed within hours), leaving 9,849 bugs with meaningful lifetimes.

The results were striking:

Metric Value
Bugs analyzed 9,876
Average lifetime 2.8 years
Median lifetime 1.0 year
Maximum 20.7 years

Almost 20% of bugs had been hiding for 5+ years. The networking subsystem looked particularly bad at 5.1 years average. I found a refcount leak in netfilter that had been in the kernel for 19 years.

Initial Bug Lifetime Distribution Initial findings: Half of bugs found within a year, but 20% hide for 5+ years.

But something nagged at me: my dataset only contained fixes from 2025. Was I seeing the full picture, or just the tip of the iceberg?

Going deeper: Mining the full history

I rewrote my miner to capture every Fixes: tag since Linux moved to git in 2005. Six hours later, I had 125,183 vulnerability records which was 12x larger than my initial dataset.

The numbers changed significantly:

Metric 2025 Only Full History (2005-2025)
Bugs analyzed 9,876 125,183
Average lifetime 2.8 years 2.1 years
Median lifetime 1.0 year 0.7 years
5+ year bugs 19.4% 13.5%
10+ year bugs 6.6% 4.2%

Full Dataset Bug Lifetime Distribution Full history: 57% of bugs found within a year. The long tail is smaller than it first appeared.

Why the difference? My initial 2025-only dataset was biased. Fixes in 2025 include:

  • New bugs introduced recently and caught quickly
  • Ancient bugs that finally got discovered after years of hiding

The ancient bugs skewed the average upward. When you include the full history with all the bugs that were introduced AND fixed within the same year, the average drops from 2.8 to 2.1 years.

The real story: We're getting faster (but it's complicated)

The most striking finding from the full dataset: bugs introduced in recent years appear to get fixed much faster.

Year Introduced Bugs Avg Lifetime % Found <1yr
2010 1,033 9.9 years 0%
2014 3,991 3.9 years 31%
2018 11,334 1.7 years 54%
2022 11,090 0.8 years 69%

Bugs introduced in 2010 took nearly 10 years to find and bugs introduced in 2024 are found in 5 months. At first glance it looks like a 20x improvement!

But here's the catch: this data is right-censored. Bugs introduced in 2022 can't have a 10-year lifetime yet since we're only in 2026. We might find more 2022 bugs in 2030 that bring the average up.

The fairer comparison is "% found within 1 year" and that IS improving: from 0% (2010) to 69% (2022). That's real progress, likely driven by:

  • Syzkaller (released 2015)
  • KASAN, KMSAN, KCSAN sanitizers
  • Better static analysis
  • More contributors reviewing code

But there's a backlog. When I look at just the bugs fixed in 2024-2025:

  • 60% were introduced in the last 2 years (new bugs, caught quickly)
  • 18% were introduced 5-10 years ago
  • 6.5% were introduced 10+ years ago

We're simultaneously catching new bugs faster AND slowly working through ~5,400 ancient bugs that have been hiding for over 5 years.

The methodology

The kernel has a convention: when a commit fixes a bug, it includes a Fixes: tag pointing to the commit that introduced the bug.

commit de788b2e6227
Author: Florian Westphal <fw@strlen.de>
Date:   Fri Aug 1 17:25:08 2025 +0200

    netfilter: ctnetlink: fix refcount leak on table dump

    Fixes: d205dc40798d ("netfilter: ctnetlink: ...")

I wrote a miner that:

  1. Runs git log --grep="Fixes:" to find all fixing commits
  2. Extracts the referenced commit hash from the Fixes: tag
  3. Pulls dates from both commits
  4. Classifies subsystem from file paths (70+ patterns)
  5. Detects bug type from commit message keywords
  6. Calculates the lifetime
fixes_pattern = r'Fixes:\s*([0-9a-f]{12,40})'
match = re.search(fixes_pattern, commit_message)
if match:
    introducing_hash = match.group(1)
    lifetime_days = (fixing_date - introducing_date).days

Dataset details:

Parameter Value
Kernel version v6.19-rc3
Mining date January 6, 2026
Fixes mined since 2005-04-16 (git epoch)
Total records 125,183
Unique fixing commits 119,449
Unique bug-introducing authors 9,159
With CVE ID 158
With Cc: stable 27,875 (22%)

Coverage note: The kernel has ~448,000 commits mentioning "fix" in some form, but only ~124,000 (28%) use proper Fixes: tags. My dataset captures the well-documented bugs aka the ones where maintainers traced the root cause.

It varies by subsystem

Some subsystems have bugs that persist far longer than others:

Subsystem Bug Count Avg Lifetime
drivers/can 446 4.2 years
networking/sctp 279 4.0 years
networking/ipv4 1,661 3.6 years
usb 2,505 3.5 years
tty 1,033 3.5 years
netfilter 1,181 2.9 years
networking 6,079 2.9 years
memory 2,459 1.8 years
gpu 5,212 1.4 years
bpf 959 1.1 years

Bug Lifetime by Subsystem CAN bus and SCTP bugs persist longest. BPF and GPU bugs get caught fastest.

CAN bus drivers and SCTP networking have bugs that persist longest probably because both are niche protocols with less testing coverage. GPU (especially Intel i915) and BPF bugs get caught fastest, probably thanks to dedicated fuzzing infrastructure.

Interesting finding from comparing 2025-only vs full history:

Subsystem 2025-only Avg Full History Avg Difference
networking 5.2 years 2.9 years -2.3 years
filesystem 3.8 years 2.6 years -1.2 years
drivers/net 3.3 years 2.2 years -1.1 years
gpu 1.4 years 1.4 years 0 years

Networking looked terrible in the 2025-only data (5.2 years!) but is actually closer to average in the full history (2.9 years). The 2025 fixes were catching a backlog of ancient networking bugs. GPU looks the same either way, and those bugs get caught consistently fast.

Some bug types hide longer than others

Race conditions are the hardest to find, averaging 5.1 years to discovery:

Bug Type Count Avg Lifetime Median
race-condition 1,188 5.1 years 2.6 years
integer-overflow 298 3.9 years 2.2 years
use-after-free 2,963 3.2 years 1.4 years
memory-leak 2,846 3.1 years 1.4 years
buffer-overflow 399 3.1 years 1.5 years
refcount 2,209 2.8 years 1.3 years
null-deref 4,931 2.2 years 0.7 years
deadlock 1,683 2.2 years 0.8 years

Why do race conditions hide so long? They're non-deterministic and only trigger under specific timing conditions that might occur once per million executions. Even sanitizers like KCSAN can only flag races they observe.

30% of bugs are self-fixes where the same person who introduced the bug eventually fixed it. I guess code ownership matters.

Why some bugs hide longer

Less fuzzing coverage. Syzkaller excels at syscall fuzzing but struggles with stateful protocols. Fuzzing netfilter effectively requires generating valid packet sequences that traverse specific connection tracking states.

Harder to trigger. Many networking bugs require:

  • Specific packet sequences
  • Race conditions between concurrent flows
  • Memory pressure during table operations
  • Particular NUMA topologies

Older code with fewer eyes. Core networking infrastructure like nf_conntrack was written in the mid-2000s. It works, so nobody rewrites it. But "stable" means fewer developers actively reviewing.

Case study: 19 years in the kernel

One of the oldest networking bug in my dataset was introduced in August 2006 and fixed in August 2025:

// ctnetlink_dump_table() - the buggy code path
if (res < 0) {
    nf_conntrack_get(&ct->ct_general);  // increments refcount
    cb->args[1] = (unsigned long)ct;
    break;
}

The irony: Commit d205dc40798d was itself a fix: "[NETFILTER]: ctnetlink: fix deadlock in table dumping". Patrick McHardy was fixing a deadlock by removing a _put() call. In doing so, he introduced a refcount leak that would persist for 19 years.

The bug: the code doesn't check if ct == last. If the current entry is the same as the one we already saved, we've now incremented its refcount twice but will only decrement it once. The object never gets freed.

// What should have been checked:
if (res < 0) {
    if (ct != last)  // <-- this check was missing for 19 years
        nf_conntrack_get(&ct->ct_general);
    cb->args[1] = (unsigned long)ct;
    break;
}

The consequence: Memory leaks accumulate. Eventually nf_conntrack_cleanup_net_list() waits forever for the refcount to hit zero. The netns teardown hangs. If you're using containers, this blocks container cleanup indefinitely.

Why it took 19 years: You had to run conntrack_resize.sh in a loop for ~20 minutes under memory pressure. The fix commit says: "This can be reproduced by running conntrack_resize.sh selftest in a loop. It takes ~20 minutes for me on a preemptible kernel." Nobody ran that specific test sequence for two decades.

Incomplete fixes are common

Here's a pattern I keep seeing: someone notices undefined behavior, ships a fix, but the fix doesn't fully close the hole.

Case study: netfilter set field validation

Date Commit What happened
Jan 2020 f3a2181e16f1 Stefano Brivio adds support for sets with multiple ranged fields. Introduces NFTA_SET_DESC_CONCAT for specifying field lengths.
Jan 2024 3ce67e3793f4 Pablo Neira notices the code doesn't validate that field lengths sum to the key length. Ships a fix. Commit message: "I did not manage to crash nft_set_pipapo with mismatch fields and set key length so far, but this is UB which must be disallowed."
Jan 2025 1b9335a8000f Security researcher finds a bypass. The 2024 fix was incomplete—there were still code paths that could mismatch. Real fix shipped.

The 2024 fix was an acknowledgment that something was wrong, but Pablo couldn't find a crash, so the fix was conservative. A year later, someone found the crash.

This pattern suggests a detection opportunity: commits that say things like "this is undefined behavior" or "I couldn't trigger this but..." are flags. The author knows something is wrong but hasn't fully characterized the bug. These deserve extra scrutiny.

The anatomy of a long-lived bug

Looking at the bugs that survive 10+ years, I see common patterns:

1. Reference counting errors

kref_get(&obj->ref);
// ... error path returns without kref_put()

These don't crash immediately. They leak memory slowly. In a long-running system, you might not notice until months later when OOM killer starts firing.

2. Missing NULL checks after dereference

struct foo *f = get_foo();
f->bar = 1;              // dereference happens first
if (!f) return -EINVAL;  // check comes too late

The compiler might optimize away the NULL check since you already dereferenced. These survive because the pointer is rarely NULL in practice.

3. Integer overflow in size calculations

size_t total = n_elements * element_size;  // can overflow
buf = kmalloc(total, GFP_KERNEL);
memcpy(buf, src, n_elements * element_size);  // copies more than allocated

If n_elements comes from userspace, an attacker can cause allocation of a small buffer followed by a large copy.

4. Race conditions in state machines

spin_lock(&lock);
if (state == READY) {
    spin_unlock(&lock);
    // window here where another thread can change state
    do_operation();  // assumes state is still READY
}

These require precise timing to hit. They might manifest as rare crashes that nobody can reproduce.

Can we catch these bugs automatically?

Every day a bug lives in the kernel is another day millions of devices are vulnerable. Android phones, servers, embedded systems, cloud infrastructure, all running kernel code with bugs that won't be found for years.

I built VulnBERT, a model that predicts whether a commit introduces a vulnerability.

Model evolution:

Model Recall FPR F1 Notes
Random Forest 76.8% 15.9% 0.80 Hand-crafted features only
CodeBERT (fine-tuned) 89.2% 48.1% 0.65 High recall, unusable FPR
VulnBERT 92.2% 1.2% 0.95 Best of both approaches

The problem with vanilla CodeBERT: I first tried fine-tuning CodeBERT directly. Results: 89% recall but 48% false positive rate (measured on the same test set). Unusable, flagging half of all commits.

Why so bad? CodeBERT learns shortcuts: "big diff = dangerous", "lots of pointers = risky". These correlations exist in training data but don't generalize. The model pattern-matches on surface features, not actual bug patterns.

The VulnBERT approach: Combine neural pattern recognition with human domain expertise.

┌─────────────────────────────────────────────────────────────────────┐
│                            INPUT: Git Diff                          │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
                ┌───────────────┴───────────────┐
                ▼                               ▼
┌───────────────────────────┐   ┌───────────────────────────────────┐
│   Chunked Diff Encoder    │   │   Handcrafted Feature Extractor   │
│   (CodeBERT + Attention)  │   │   (51 engineered features)        │
└─────────────┬─────────────┘   └─────────────────┬─────────────────┘
              │ [768-dim]                         │ [51-dim]
              └───────────────┬───────────────────┘
                              ▼
              ┌───────────────────────────────┐
              │     Cross-Attention Fusion    │
              │     "When code looks like X,  │
              │      feature Y matters more"  │
              └───────────────┬───────────────┘
                              ▼
              ┌───────────────────────────────┐
              │        Risk Classifier        │
              └───────────────────────────────┘

Three innovations that drove performance:

1. Chunked encoding for long diffs. CodeBERT's 512-token limit truncates most kernel diffs (often 2000+ tokens). I split into chunks, encode each, then use learned attention to aggregate:

# Learnable attention over chunks
chunk_attention = nn.Sequential(
    nn.Linear(hidden_size, hidden_size // 4),
    nn.Tanh(),
    nn.Linear(hidden_size // 4, 1)
)
attention_weights = F.softmax(chunk_attention(chunk_embeddings), dim=1)
pooled = (attention_weights * chunk_embeddings).sum(dim=1)

The model learns which chunks matter aka the one with spin_lock without spin_unlock, not the boilerplate.

2. Feature fusion via cross-attention. Neural networks miss domain-specific patterns. I extract 51 handcrafted features using regex and AST-like analysis of the diff:

Category Features
Basic (4) lines_added, lines_removed, files_changed, hunks_count
Memory (3) has_kmalloc, has_kfree, has_alloc_no_free
Refcount (5) has_get, has_put, get_count, put_count, unbalanced_refcount
Locking (5) has_lock, has_unlock, lock_count, unlock_count, unbalanced_lock
Pointers (4) has_deref, deref_count, has_null_check, has_deref_no_null_check
Error handling (6) has_goto, goto_count, has_error_return, has_error_label, error_return_count, has_early_return
Semantic (13) var_after_loop, iterator_modified_in_loop, list_iteration, list_del_in_loop, has_container_of, has_cast, cast_count, sizeof_type, sizeof_ptr, has_arithmetic, has_shift, has_copy, copy_count
Structural (11) if_count, else_count, switch_count, case_count, loop_count, ternary_count, cyclomatic_complexity, max_nesting_depth, function_call_count, unique_functions_called, function_definitions

The key bug-pattern features:

'unbalanced_refcount': 1,    # kref_get without kref_put → leak
'unbalanced_lock': 1,        # spin_lock without spin_unlock → deadlock
'has_deref_no_null_check': 0,# *ptr without if(!ptr) → null deref
'has_alloc_no_free': 0,      # kmalloc without kfree → memory leak

Cross-attention learns conditional relationships. When CodeBERT sees locking patterns AND unbalanced_lock=1, that's HIGH risk. Neither signal alone is sufficient, it's the combination.

# Feature fusion via cross-attention
feature_embedding = feature_projection(handcrafted_features)  # 51 → 768
attended, _ = cross_attention(
    query=code_embedding,      # What patterns does the code have?
    key=feature_embedding,     # What do the hand-crafted features say?
    value=feature_embedding
)
fused = fusion_layer(torch.cat([code_embedding, attended], dim=-1))

3. Focal loss for hard examples. The training data is imbalanced where most commits are safe. Standard cross-entropy wastes gradient updates on easy examples. Focal loss:

Standard loss when p=0.95 (easy):  0.05
Focal loss when p=0.95:            0.000125  (400x smaller)

The model focuses on ambiguous commits: the hard 5% that matter.

Impact of each component (estimated from ablation experiments):

Component F1 Score
CodeBERT baseline ~76%
+ Focal loss ~80%
+ Feature fusion ~88%
+ Contrastive learning ~91%
Full VulnBERT 95.4%

Note: Individual component impacts are approximate; interactions between components make precise attribution difficult.

The key insight: neither neural networks nor hand-crafted rules alone achieve the best results. The combination does.

Results on temporal validation (train ≤2023, test 2024):

Metric Target Result
Recall 90% 92.2%
FPR <10% 1.2%
Precision 98.7%
F1 95.4%
AUC 98.4%

What these metrics mean:

  • Recall (92.2%): Of all actual bug-introducing commits, we catch 92.2%. Missing 7.8% of bugs.
  • False Positive Rate (1.2%): Of all safe commits, we incorrectly flag 1.2%. Low FPR = fewer false alarms.
  • Precision (98.7%): Of commits we flag as risky, 98.7% actually are. When we raise an alarm, we're almost always right.
  • F1 (95.4%): Harmonic mean of precision and recall. Single number summarizing overall performance.
  • AUC (98.4%): Area under ROC curve. Measures ranking quality—how well the model separates bugs from safe commits across all thresholds.

The model correctly differentiates the same bug at different stages:

Commit Description Risk
acf44a2361b8 Fix for UAF in xe_vfio 12.4% LOW ✓
1f5556ec8b9e Introduced the UAF 83.8% HIGH ✓

What the model sees: The 19-year bug

When analyzing the bug-introducing commit d205dc40798d:

-    if (ct == last) {
-        nf_conntrack_put(&last->ct_general);  // removed!
-    }
+    if (ct == last) {
+        last = NULL;
         continue;
     }
     if (ctnetlink_fill_info(...) < 0) {
         nf_conntrack_get(&ct->ct_general);  // still here

Extracted features:

Feature Value Signal
get_count 1 nf_conntrack_get() present
put_count 0 nf_conntrack_put() was removed
unbalanced_refcount 1 Mismatch detected
has_lock 1 Uses read_lock_bh()
list_iteration 1 Uses list_for_each_prev()

Model prediction: 72% risk: HIGH

The unbalanced_refcount feature fires because _put() was removed but _get() remains. Classic refcount leak pattern.

Limitations

Dataset limitations:

  • Only captures bugs with Fixes: tags (~28% of fix commits). Selection bias: well-documented bugs tend to be more serious.
  • Mainline only, doesn't include stable-branch-only fixes or vendor patches
  • Subsystem classification is heuristic-based (regex on file paths)
  • Bug type detection based on keyword matching in commit messages and many bugs are "unknown" type
  • Lifetime calculation uses author dates, not commit dates, rebasing can skew timestamps
  • Some "bugs" may be theoretical (comments like "fix possible race" without confirmed trigger)

Model limitations:

  • 92.2% recall is on a held-out 2024 test set, not a guarantee for future bugs
  • Can't catch semantic bugs (logic errors with no syntactic signal)
  • Cross-function blind spots (bug spans multiple files)
  • Training data bias (learns patterns from bugs that were found, novel patterns may be missed)
  • False positives on intentional patterns (init/cleanup in different commits)
  • Tested only on Linux kernel code, may not generalize to other codebases

Statistical limitations:

  • Survivorship bias in year-over-year comparisons (recent bugs can't have long lifetimes yet)
  • Correlation ≠ causation for subsystem/bug-type lifetime differences

What this means: VulnBERT is a triage tool, not a guarantee. It catches 92% of bugs with recognizable patterns. The remaining 8% and novel bug classes still need human review and fuzzing.

What's next

92.2% recall with 1.2% FPR is production-ready. But there's more to do:

  • RL-based exploration: Instead of static pattern matching, train an agent to explore code paths and find bugs autonomously. The current model predicts risk; an RL agent could generate triggering inputs.
  • Syzkaller integration: Use fuzzer coverage as a reward signal. If the model flags a commit and Syzkaller finds a crash in that code path, that's strong positive signal.
  • Subsystem-specific models: Networking bugs have different patterns than driver bugs. A model fine-tuned on netfilter might outperform the general model on netfilter commits.

The goal isn't to replace human reviewers but to point them at the 10% of commits most likely to be problematic, so they can focus attention where it matters.

Reproducing this

The dataset extraction uses the kernel's Fixes: tag convention. Here's the core logic:

def extract_fixes_tag(commit_msg: str) -> Optional[str]:
    """Extract the commit ID from a Fixes: tag"""
    pattern = r'Fixes:\s*([a-f0-9]{12,40})'
    match = re.search(pattern, commit_msg, re.IGNORECASE)
    return match.group(1) if match else None

# Mine all Fixes: tags from git history
git log --since="2005-04-16" --grep="Fixes:" --format="%H"

# For each fixing commit:
#   - Extract introducing commit hash
#   - Get dates from both commits
#   - Calculate lifetime
#   - Classify subsystem from file paths

Full miner code and dataset: github.com/quguanni/kernel-vuln-data

TL;DR

  • 125,183 bugs analyzed from 20 years of Linux kernel git history (123,696 with valid lifetimes)
  • Average bug lifetime: 2.1 years (2.8 years in 2025-only data due to survivorship bias in recent fixes)
  • 0% → 69% of bugs found within 1 year (2010 vs 2022) (real improvement from better tooling)
  • 13.5% of bugs hide for 5+ years (these are the dangerous ones)
  • Race conditions hide longest (5.1 years average)
  • VulnBERT catches 92.2% of bugs on held-out 2024 test set with only 1.2% FPR (98.4% AUC)
  • Dataset: github.com/quguanni/kernel-vuln-data

If you're working on kernel security, vulnerability detection, or ML for code analysis, I'd love to talk: jenny@pebblebed.com


Read the original article

Comments

  • By Fiveplus 2026-01-083:4718 reply

    Before the "rewrite it in Rust" comments take over the thread:

    It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker. Rust is fantastic for memory safety, but it will not stop you from misunderstanding the spec of a network card or writing a race condition in unsafe logic that interacts with DMA.

    That said, if we eliminated the 70% of bugs that are memory safety issues, the SNR ratio for finding these deep logic bugs would improve dramatically. We spend so much time tracing segfaults that we miss the subtle corruption bugs.

    • By aw1621107 2026-01-084:153 reply

      > It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions)

      While the bugs you describe are indeed things that aren't directly addressed by Rust's borrow checker, I think the article covers more ground than your comment implies.

      For example, a significant portion (most?) of the article is simply analyzing the gathered data, like grouping bugs by subsystem:

          Subsystem        Bug Count  Avg Lifetime
          drivers/can      446        4.2 years
          networking/sctp  279        4.0 years
          networking/ipv4  1,661      3.6 years
          usb              2,505      3.5 years
          tty              1,033      3.5 years
          netfilter        1,181      2.9 years
          networking       6,079      2.9 years
          memory           2,459      1.8 years
          gpu              5,212      1.4 years
          bpf              959        1.1 years
      
      
      Or by type:

          Bug Type         Count  Avg Lifetime  Median
          race-condition   1,188  5.1 years     2.6 years
          integer-overflow 298    3.9 years     2.2 years
          use-after-free   2,963  3.2 years     1.4 years
          memory-leak      2,846  3.1 years     1.4 years
          buffer-overflow  399    3.1 years     1.5 years
          refcount         2,209  2.8 years     1.3 years
          null-deref       4,931  2.2 years     0.7 years
          deadlock         1,683  2.2 years     0.8 years
      
      And the section describing common patterns for long-lived bugs (10+ years) lists the following:

      > 1. Reference counting errors

      > 2. Missing NULL checks after dereference

      > 3. Integer overflow in size calculations

      > 4. Race conditions in state machines

      All of which cover more ground than listed in your comment.

      Furthermore, the 19-year-old bug case study is a refcounting error not related to highly concurrent state machines or hardware assumptions.

      • By johncolanduoni 2026-01-084:253 reply

        It depends what they mean by some of these: are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).

        It’s also worth noting that Rust doesn’t prevent integer overflow, and it doesn’t panic on it by default in release builds. Instead, the safety model assumes you’ll catch the overflowed number when you use it to index something (a constant source of bugs in unsafe code).

        I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.

        • By aw1621107 2026-01-084:592 reply

          > are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).

          The example given looks like a generalized example:

              spin_lock(&lock);
              if (state == READY) {
                  spin_unlock(&lock);
                  // window here where another thread can change state
                  do_operation();  // assumes state is still READY
              }
          
          So I don't think you can draw strong conclusions from it.

          > I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.

          Sure, all I'm trying to say is that "the class of bugs described here" covers more than what was listed in the parentheses.

          • By jiggawatts 2026-01-0812:47

            The default Mutex struct in Rust makes it impossible to modify the data it protects without holding the lock.

            "Each mutex has a type parameter which represents the data that it is protecting. The data can only be accessed through the RAII guards returned from lock and try_lock, which guarantees that the data is only ever accessed when the mutex is locked."

            Even if used with more complex operations, the RAII approach means that the example you provided is much less likely to happen.

          • By rjzzleep 2026-01-086:267 reply

            I'd argue, that while null ref and those classes of bugs may decrease, logic errors will increase. Rust is not an extraordinary readable language in my opinion, especially in the kernel where the kernel has its own data structures. IMHO Apple did it right in their kernel stack, they have a restricted subset of C++ that you can write drivers with.

            Which is also why in my opinion Zig is much more suitable, because it actually addresses the readability aspect without bring huge complexity with it.

            • By aw1621107 2026-01-087:31

              > I'd argue, that while null ref and those classes of bugs may decrease, logic errors will increase.

              To some extent that argument only makes sense; if you can find a way to greatly reduce the incidence of non-logic bugs while not addressing other bugs then of course logic bugs would make up a greater proportion of what remains.

              I think it's also worth considering the fact that while Rust doesn't guarantee that it'll catch all logic bugs, it (like other languages with more "advanced" type systems) gives you tools to construct systems that can catch certain kinds of logic bugs. For example, you can write lock types in a way that guarantees at compile time that you'll take locks in the correct order, avoiding deadlocks [0]. Another example is the typestate pattern [1], which can encode state machine transitions in the type system to ensure that invalid transitions and/or operations on invalid states are caught at compile time.

              These, in turn, can lead to higher-order benefits as offloading some checks to the compiler means you can devote more attention to things the compiler can't check (though to be fair this does seem to be more variable among different programmers).

              > Rust is not an extraordinary readable language in my opinion, especially in the kernel where the kernel has its own data structures.

              The above notwithstanding, I'd imagine it's possible to think up scenarios where Rust would make some logic bugs more visible and others less so; only time will tell which prevails in the Linux kernel, though based on what we know now I don't think there's strong support for the notion that logic bugs in Rust are a substantially more common than they have been in C, let alone because of readability issues.

              Of course there's the fact that readability is very much a personal thing and is a multidimensional metric to boot (e.g., a property that makes code readable in one context may simultaneously make code less readable in another). I don't think there would be a universal answer here.

              [0]: https://lwn.net/Articles/995814/

              [1]: https://cliffle.com/blog/rust-typestate/

            • By viraptor 2026-01-0810:13

              Maybe increase as a ratio, but not absolute. There are various benefits of Rust that affect other classes of issues: fancy enums, better errors, ability to control overflow behaviour and others. But for actual experience, check out what the kernel code developer has to say: https://xcancel.com/linaasahi/status/1577667445719912450

            • By oguz-ismail2 2026-01-086:421 reply

              > Zig is much more suitable, because it actually addresses the readability aspect

              How? It doesn't look very different from Rust. In terms of readability Swift does stand out among LLVM frontends, don't know if it is or can be used for systems programming though.

              • By Someone 2026-01-0810:421 reply

                Apple claims Swift can be used for systems programming, and is (partly) eating its own dogfood by using it in FoundationDB (https://news.ycombinator.com/item?id=38444876) and by providing examples of embedded projects (https://www.swift.org/get-started/embedded/)

                I think they are right in that claim, but in making it so, at least some of the code loses some of the readability of Swift. For truly low-level code, you’ll want to give up on classes, may not want to have copy-on-write collections, and may need to add quite a few some annotations.

                • By galangalalgol 2026-01-0813:481 reply

                  Swift is very slow relative to rust or c though. You can also cause seg faults in swift with a few lines. I Don't find any of these languages particularly difficult to read, so I'm not sure why this is listed as a discriminator between them.

                  • By saagarjha 2026-01-0814:481 reply

                    But those segfaults will either be memory memory safe or your lines will contain “unsafe” or “unchecked” somewhere.

                    • By galangalalgol 2026-01-090:391 reply

                      You can make a fully safe segfault the same way you can in go. Swapping a base reference between two child types. The data pointer and vft pointer aren't updated atomically, so a thread safety issue becomes a memory safety one.

                      • By saagarjha 2026-01-0912:131 reply

                        This is no longer allowed with strict concurrency

                        • By galangalalgol 2026-01-0912:28

                          When did that happen? Or is it something I have to turn on? I had Claude write a swift version of the go version a few months ago and it segfaulted.

                          Edit: Ah, the global variable I used had a warning that it isn't concurrency safe I didn't notice. So you can compile it, but if you treat warnings as errors you'd be fine.

            • By bcrosby95 2026-01-087:351 reply

              I would argue logic errors would decrease because you aren't spending as much time worrying about and fixing null ref and other errors.

              • By Tarucho 2026-01-0819:49

                can you prove that?

            • By staticassertion 2026-01-0812:40

              Rust is a lot more explicit. I suspect logic bugs will be much less common. It's far easier to model complexity in Rust.

            • By rowanG077 2026-01-087:23

              I would expect the opposite. C requires you to deal with extreme design complexity in large systems because the language offers nothing to help.

        • By materielle 2026-01-0815:001 reply

          I don’t think that the parent comment is saying all of the bugs would have been prevented by using Rust.

          But in the listed categories, I’m equally skeptical that none of them would have benefited from Rust even a bit.

          • By johncolanduoni 2026-01-0817:00

            That’s not my point - just that “state machine races” is a too-broad category to say much about how Rust would or wouldn’t help.

        • By yencabulator 2026-01-0923:05

          > It’s also worth noting that Rust doesn’t prevent integer overflow

          Add a single line to a single file and you get that enforced.

          https://rust-lang.github.io/rust-clippy/stable/index.html#ar...

      • By RealityVoid 2026-01-0816:47

        Why doesn't it surprise me that the CAN bus driver bugs have the longest average lifetime?

      • By apaprocki 2026-01-087:20

        > Furthermore, the 19-year-old bug case study is a refcounting error

        It always surprised me how the top-of-the line analyzers, whether commercial or OSS, never really implemented C-style reference count checking. Maybe someone out there has written something that works well, but I haven’t seen it.

    • By johncolanduoni 2026-01-084:18

      This is I think an under-appreciated aspect, both for detractors and boosters. I take a lot more “risks” with Rust, in terms of not thinking deeply about “normal” memory safety and prioritizing structuring my code to make the logic more obviously correct. In C++, modeling things so that the memory safety is super-straightforward is paramount - you’ll almost never see me store a std::string_view anywhere for example. In Rust I just put &str wherever I please, if I make a mistake I’ll know when I compile.

    • By anon-3988 2026-01-083:551 reply

      > It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker. Rust is fantastic for memory safety, but it will not stop you from misunderstanding the spec of a network card or writing a race condition in unsafe logic that interacts with DMA.

      Rust is not just about memory safety. It also have algebraic data types, RAII, among other things, which will greatly help in catching this kind of silly logic bugs.

    • By the8472 2026-01-084:49

      The concurrent state machine example looks like a locking error? If the assumption is that it shouldn't change in the meantime, doesn't it mean the lock should continue to be held? In that case rust locks can help, because they can embed the data, which means you can't even touch it if it's not held.

    • By kubb 2026-01-089:092 reply

      It’s hilarious that you feel the need to preemptively take control of the narrative in anticipation of the Rust people that you fear so much.

      Is this an irrational fear, I wonder? Reminds me of methods used in the political discourse.

      • By Bridged7756 2026-01-0815:06

        People who make that kind of remarks should be called out and shunned. The Rust community is tired of discrimination and being the butt of jokes. All the other inferior languages prey on its minority status, despite Rust being able to solve all their problems. I take offense to these remarks, I don't want my kids to grow up as Rustaceans in such a caustic society.

      • By irishcoffee 2026-01-0814:441 reply

        > It’s hilarious that you feel the need to preemptively take control of the narrative in anticipation of the Rust people that you fear so much.

        > Is this an irrational fear, I wonder? Reminds me of methods used in the political discourse.

        In a sad sort of way, I think its hilarious that hn users have been so completely conditioned to expect rust evangelism any time a topic like this comes up that they wanted to get ahead of it.

        Not sure who it says more about, but it sure does say a whole lot.

        • By kubb 2026-01-0815:391 reply

          I don’t think evangelism is necessary anymore. Rust adoption is now a matter of time.

          • By Ferret7446 2026-01-1023:03

            Rust feels a lot like Ruby (fancy/weird with a fanatical user base). Fil-C is a far more practical route to memory safety (a la Python in this analogy).

    • By john01dav 2026-01-0818:261 reply

      Rust has more features than just the borrow checker. For example, it has a a more featured type system than C or C++, which a good developer can use to detect some logic mistakes at compile time. This doesn't eliminate bugs, but it can catch some very early.

      • By wordisside 2026-01-0818:321 reply

        [dead]

        • By aw1621107 2026-01-0819:111 reply

          > But unsafe Rust, which is generally more often used in low-level code, is more difficult than C and C++.

          I think "is" is a bit too strong. "Can be", sure, but I'm rather skeptical that all uses of unsafe Rust will be more difficult than writing equivalent C/C++ code.

    • By pjc50 2026-01-0811:56

      > race condition in unsafe logic that interacts with DMA

      It's worth noting that if you write memory safe code but mis-program a DMA transfer, or trigger a bug in a PCIe device, it's possible for the hardware to give you memory-safety problems by splatting invalid data over a region that's supposed to contain something else.

    • By mgaunard 2026-01-0810:176 reply

      I don't think 70% of bugs are memory safety issues.

      In my experience it's closer to 5%.

      • By cogman10 2026-01-0813:252 reply

        I believe this is where that fact comes from [1]

        Basically, 70% of high severity bugs are memory safety.

        [1] https://www.chromium.org/Home/chromium-security/memory-safet...

        • By saagarjha 2026-01-0814:50

          High severity security issues.

        • By mgaunard 2026-01-0814:42

          Right, which is a measure which is heavily biased towards memory safety bugs.

      • By IshKebab 2026-01-0815:31

        70% of security vulnerabilities are due to memory safety. Not all bugs.

      • By stonemetal12 2026-01-0820:07

        Using the data provided, memory safety issues (use-after-free, memory-leak, buffer-overflow, null-deref) account for 67% of their bugs. If we include refcount It is just over 80%.

      • By tester756 2026-01-0813:28

        That's the figure that Microsoft and Google found in their code bases.

      • By redeeman 2026-01-0812:331 reply

        probably quite a bit less than 5%, however, they tend to be quite serious when they happen

        • By mgaunard 2026-01-0814:411 reply

          Only serious if you care about protecting from malicious actors running code on the same host.

          • By redeeman 2026-01-0820:581 reply

            you dont? I would imagine people that runs for example a browser would have quite an interest in that

            • By mgaunard 2026-01-0914:431 reply

              Browsers are sandboxed, and working on the web browsers themselves is a very small niche, as is working on kernels.

              Software increasingly runs either on dedicated infrastructure or virtual ones; in those cases there isn't really a case where you need to worry about software running on the same host trying to access the data.

              Sure, it's useful to have some restrictions in place to track what needs access to what resource, but in practice they can always be circumvented for debugging or convenience of development.

              • By yencabulator 2026-01-0923:101 reply

                Browsers are sandboxed by the kernel, and we're talking about bugs in the kernel here...

                • By mgaunard 2026-01-109:58

                  Even if modern browsers lean more on kernel features, initially the sandboxing in browsers is implemented through a managed runtime.

      • By nibman 2026-01-0811:52

        [dead]

    • By BobbyTables2 2026-01-093:56

      I’ve seen too many embedded drivers written by well known companies not use spinlocks for data shared with an ISR.

      At one point, I found serious bugs (crashing our product) that had existed for over 15 years. (And that was 10 years ago).

      Rust may not be perfect but it gives me hope that some classes of stupidity will be either be avoided or made visible (like every function being unsafe because the author was a complete idiot).

    • By eru 2026-01-1010:06

      > It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker.

      You are right about that, but even just using sum types eliminates a lot of logic errors, too.

    • By keybored 2026-01-088:541 reply

      No other top-level comments have since mentioned Rust[1] and TFA mentions neither Rust nor topics like memory safety. It’s just plain bugs.

      The Rust phantom zealotry is unfortunately real.

      [1] Aha, but the chilling effect of dismissing RIR comments before they are even posted...

      • By staticassertion 2026-01-0812:42

        Yes, I saw this last night and was confused because only one comment mentioned Rust, and it was deleted I think. I nearly replied "you're about to prompt 1,000 rust replies with this" and here's what I woke up to lol

    • By paulddraper 2026-01-085:051 reply

      Rust would prevent a number of bugs, as it can model state machine guarantees as well.

      Rewriting it all in Rust is extremely expensive, so it won't be done (soon).

      • By wiz21c 2026-01-087:591 reply

        Expensive because of: 1/ a re-write is never easy 2/ rust is specifically tough (because it catches error and forces you to think about it for real, because it makes some contruct (linked list) really hard to implement) for kernel/close to kernel code ?

        • By IshKebab 2026-01-0815:341 reply

          Both I'd say. Rust imposes more constraints on the structure of code than most languages. The borrow checker really likes ownership trees whereas most languages allow any ownership graph no matter how spaghetti it is.

          As far as I know that's why Microsoft rewrote Typescript in Go instead of Rust.

          • By wiz21c 2026-01-098:481 reply

            I've been using rust for several years now and I like the way you explain the essence of the issue: tree instead of spaghetti :-)

            However: https://www.reddit.com/r/typescript/comments/wbkfsh/which_pr...

            so looks like it's not written in go :-)

            • By IshKebab 2026-01-0912:101 reply

              > so looks like it's not written in go :-)

              That post is three years old, before the rewrite.

              • By wiz21c 2026-01-139:26

                I missed that. For the curious:

                https://www.reddit.com/r/golang/comments/1j8shzb/microsoft_r...

                When asked why go and not rust, they said: "The existing (javascript) code base makes certain assumptions -- specifically, it assumes that there is automatic garbage collection -- and that pretty much limited our choices. That heavily ruled out Rust. I mean, in Rust you have memory management, but it's not automatic; you can get reference counting or whatever you could, but then, in addition to that, there's the borrow checker and the rather stringent constraints it puts on you around ownership of data structures. In particular, it effectively outlaws cyclic data structures, and all of our data structures are heavily cyclic. "

                sharp!

    • By lynx97 2026-01-0816:44

      Thanks for raising this. It feels like evangelists paint a picture of Rust basically being magic which squashes all bugs. My personal experience is rather different. When I gave Rust a whirl a few years ago, I happened to play with mio for some reason I can't remember yet. Had some basic PoC code which didn't work as expected. So while not being a Rust expert, I am still too much fan of the scratch your own itch philosophy, so I started to read the mio source code. And after 5 minutes, I found the logic bug. Submitted a PR and moved on. But what stayed with me was this insight that if someone like me can casually find and fix a Rust library bug, propaganda is probably doing more work then expected. The Rust craze feels a bit like Java. Just because a language baby-sits the developer doesn't automatically mean better quality. At the end of the day, the dev needs to juggle the development process. Sure, tools are useful, but overstating safety is likely a route better avoided.

    • By IshKebab 2026-01-088:11

      Rust has other features that help prevent logic errors. It's not just C plus a borrow checker.

    • By ramon156 2026-01-089:38

      You're fighting air

    • By marcosdumay 2026-01-0820:14

      Eh... Removing concurrence bugs is one of the main selling points for Rust. And algebraic types are a really boost for situations where you have lots of assumptions.

    • By nibman 2026-01-0811:27

      [dead]

    • By DobarDabar 2026-01-0815:11

      [dead]

  • By gjfr 2026-01-0813:371 reply

    Interesting! We did a similar analysis on Content Security Policy bugs in Chrome and Firefox some time ago, where the average bug-to-report time was around 3 years and 1 year, respectively. https://www.usenix.org/conference/usenixsecurity23/presentat...

    Our bug dataset was way smaller, though, as we had to pinpoint all bug introductions unfortunately. It's nice to see the Linux project uses proper "Fixes: " tags.

    • By staticassertion 2026-01-0820:54

      > It's nice to see the Linux project uses proper "Fixes: " tags.

      Sort of. They often don't.

  • By giamma 2026-01-0812:032 reply

    Is the intention of the author to use the number of years bugs stay "hidden" as a metric of the quality of the kernel codebase or of the performance of the maintainers? I am asking because at some point the articles says "We're getting faster".

    IMHO a fact that a bug hides for years can also be indication that such bug had low severity/low priority and therefore that the overall quality is very good. Unless the time represents how long it takes to reproduce and resolve a known bug, but in such case I would not say that "bug hides" in the kernel.

    • By cogman10 2026-01-0813:341 reply

      > IMHO a fact that a bug hides for years can also be indication that such bug had low severity/low priority

      Not really true. A lot of very severe bugs have lurked for years and even decades. Heartbleed comes to mind.

      The reason these bugs often lurk for so long is because they very often don't cause a panic, which is why they can be really tricky to find.

      For example, use after free bugs are really dangerous. However, in most code, it's a pretty safe bet that nothing dangerous happens when use after free is triggered. Especially if the pointer is used shortly after the free and dies shortly after it. In many cases, the erroneous read or write doesn't break something.

      The same is true of the race condition problems (which are some of the longest lived bugs). In a lot of cases, you won't know you have a race condition because in many cases the contention on the lock is low so the race isn't exposed. And even when it is, it can be very tricky to reproduce as the race isn't likely to be done the same way twice.

      • By turtletontine 2026-01-0815:151 reply

        > …lurked for years and even decades. Heartbleed comes to mind.

        I don’t know much about Heartbleed, but Wikipedia says:

        > Heartbleed is a security bug… It was introduced into the software in 2012 and publicly disclosed in April 2014.

        Two years doesn’t sound like “years or even decades” to me? But again, I don’t know much about Heartbleed so I may be missing something. It does say it was also patched in 2014, not just discovered then.

        • By cogman10 2026-01-0816:151 reply

          This may just be me misremembering, but as I recall, the bug of Heartbleed was ultimately a very complex macro system which supported multiple very old architectures. The bug, IIRC, was the interaction between that old macro system and the new code which is what made it hard to recognize as a bug.

          Part of the resolution to the problem was I believe they ended up removing a fair number of unsupported platforms. It also ended up spawning alternatives to openssl like boring ssl which tried to remove as much as possible to guard against this very bug.

    • By staticassertion 2026-01-0812:431 reply

      > IMHO a fact that a bug hides for years can also be indication that such bug had low severity/low priority and therefore that the overall quality is very good.

      It doesn't seem to indicate that. It indicates the bug just isn't in tested code or isn't reached often. It could still be a very severe bug.

      The issue with longer lived bugs is that someone could have been leveraging it for longer.

      • By galangalalgol 2026-01-0813:551 reply

        Worst case is that it doesn't even cause correctness issues in normal use, only when misused in a way that is unlikely to happen unintentionally.

        • By staticassertion 2026-01-0814:201 reply

          I guess because I work in security the "unintentionally" doesn't matter much to me.

          • By SAI_Peregrinus 2026-01-0815:091 reply

            But it matters for detection time, because there's a lot more "normal" use of any given piece of code than intentional attempts to break it. If a bug can't be triggered unintentionally it'll never get detected through normal use, which can lead to it staying hidden for longer.

            • By staticassertion 2026-01-0816:49

              That's not really contested? The statement was that longer detection time indicates lower severity.

HackerNews