
There are bugs in your kernel right now that won't be found for years. I know because I analyzed 125,183 of them, every bug with a traceable Fixes: tag in the Linux kernel's 20-year git history. The…
There are bugs in your kernel right now that won't be found for years. I know because I analyzed 125,183 of them, every bug with a traceable Fixes: tag in the Linux kernel's 20-year git history.
The average kernel bug lives 2.1 years before discovery. But some subsystems are far worse: CAN bus drivers average 4.2 years, SCTP networking 4.0 years. The longest-lived bug in my dataset, a buffer overflow in ethtool, sat in the kernel for 20.7 years. The one which I'll dissect in detail is refcount leak in netfilter, and it lasted 19 years.
I built a tool that catches 92% of historical bugs in a held-out test set at commit time. Here's what I learned.
| Key findings at a glance | |
|---|---|
| 125,183 | Bug-fix pairs with traceable Fixes: tags |
| 123,696 | Valid records after filtering (0 < lifetime < 27 years) |
| 2.1 years | Average time a bug hides before discovery |
| 20.7 years | Longest-lived bug (ethtool buffer overflow) |
| 0% → 69% | Bugs found within 1 year (2010 vs 2022) |
| 92.2% | Recall of VulnBERT on held-out 2024 test set |
| 1.2% | False positive rate (vs 48% for vanilla CodeBERT) |
I started by mining the most recent 10,000 commits with Fixes: tags from the Linux kernel. After filtering out invalid references (commits that pointed to hashes outside the repo, malformed tags, or merge commits), I had 9,876 valid vulnerability records. For the lifetime analysis, I excluded 27 same-day fixes (bugs introduced and fixed within hours), leaving 9,849 bugs with meaningful lifetimes.
The results were striking:
| Metric | Value |
|---|---|
| Bugs analyzed | 9,876 |
| Average lifetime | 2.8 years |
| Median lifetime | 1.0 year |
| Maximum | 20.7 years |
Almost 20% of bugs had been hiding for 5+ years. The networking subsystem looked particularly bad at 5.1 years average. I found a refcount leak in netfilter that had been in the kernel for 19 years.
Initial findings: Half of bugs found within a year, but 20% hide for 5+ years.
But something nagged at me: my dataset only contained fixes from 2025. Was I seeing the full picture, or just the tip of the iceberg?
I rewrote my miner to capture every Fixes: tag since Linux moved to git in 2005. Six hours later, I had 125,183 vulnerability records which was 12x larger than my initial dataset.
The numbers changed significantly:
| Metric | 2025 Only | Full History (2005-2025) |
|---|---|---|
| Bugs analyzed | 9,876 | 125,183 |
| Average lifetime | 2.8 years | 2.1 years |
| Median lifetime | 1.0 year | 0.7 years |
| 5+ year bugs | 19.4% | 13.5% |
| 10+ year bugs | 6.6% | 4.2% |
Full history: 57% of bugs found within a year. The long tail is smaller than it first appeared.
Why the difference? My initial 2025-only dataset was biased. Fixes in 2025 include:
The ancient bugs skewed the average upward. When you include the full history with all the bugs that were introduced AND fixed within the same year, the average drops from 2.8 to 2.1 years.
The most striking finding from the full dataset: bugs introduced in recent years appear to get fixed much faster.
| Year Introduced | Bugs | Avg Lifetime | % Found <1yr |
|---|---|---|---|
| 2010 | 1,033 | 9.9 years | 0% |
| 2014 | 3,991 | 3.9 years | 31% |
| 2018 | 11,334 | 1.7 years | 54% |
| 2022 | 11,090 | 0.8 years | 69% |
Bugs introduced in 2010 took nearly 10 years to find and bugs introduced in 2024 are found in 5 months. At first glance it looks like a 20x improvement!
But here's the catch: this data is right-censored. Bugs introduced in 2022 can't have a 10-year lifetime yet since we're only in 2026. We might find more 2022 bugs in 2030 that bring the average up.
The fairer comparison is "% found within 1 year" and that IS improving: from 0% (2010) to 69% (2022). That's real progress, likely driven by:
But there's a backlog. When I look at just the bugs fixed in 2024-2025:
We're simultaneously catching new bugs faster AND slowly working through ~5,400 ancient bugs that have been hiding for over 5 years.
The kernel has a convention: when a commit fixes a bug, it includes a Fixes: tag pointing to the commit that introduced the bug.
commit de788b2e6227
Author: Florian Westphal <fw@strlen.de>
Date: Fri Aug 1 17:25:08 2025 +0200
netfilter: ctnetlink: fix refcount leak on table dump
Fixes: d205dc40798d ("netfilter: ctnetlink: ...")
I wrote a miner that:
git log --grep="Fixes:" to find all fixing commitsFixes: tagfixes_pattern = r'Fixes:\s*([0-9a-f]{12,40})'
match = re.search(fixes_pattern, commit_message)
if match:
introducing_hash = match.group(1)
lifetime_days = (fixing_date - introducing_date).days
Dataset details:
| Parameter | Value |
|---|---|
| Kernel version | v6.19-rc3 |
| Mining date | January 6, 2026 |
| Fixes mined since | 2005-04-16 (git epoch) |
| Total records | 125,183 |
| Unique fixing commits | 119,449 |
| Unique bug-introducing authors | 9,159 |
| With CVE ID | 158 |
| With Cc: stable | 27,875 (22%) |
Coverage note: The kernel has ~448,000 commits mentioning "fix" in some form, but only ~124,000 (28%) use proper Fixes: tags. My dataset captures the well-documented bugs aka the ones where maintainers traced the root cause.
Some subsystems have bugs that persist far longer than others:
| Subsystem | Bug Count | Avg Lifetime |
|---|---|---|
| drivers/can | 446 | 4.2 years |
| networking/sctp | 279 | 4.0 years |
| networking/ipv4 | 1,661 | 3.6 years |
| usb | 2,505 | 3.5 years |
| tty | 1,033 | 3.5 years |
| netfilter | 1,181 | 2.9 years |
| networking | 6,079 | 2.9 years |
| memory | 2,459 | 1.8 years |
| gpu | 5,212 | 1.4 years |
| bpf | 959 | 1.1 years |
CAN bus and SCTP bugs persist longest. BPF and GPU bugs get caught fastest.
CAN bus drivers and SCTP networking have bugs that persist longest probably because both are niche protocols with less testing coverage. GPU (especially Intel i915) and BPF bugs get caught fastest, probably thanks to dedicated fuzzing infrastructure.
Interesting finding from comparing 2025-only vs full history:
| Subsystem | 2025-only Avg | Full History Avg | Difference |
|---|---|---|---|
| networking | 5.2 years | 2.9 years | -2.3 years |
| filesystem | 3.8 years | 2.6 years | -1.2 years |
| drivers/net | 3.3 years | 2.2 years | -1.1 years |
| gpu | 1.4 years | 1.4 years | 0 years |
Networking looked terrible in the 2025-only data (5.2 years!) but is actually closer to average in the full history (2.9 years). The 2025 fixes were catching a backlog of ancient networking bugs. GPU looks the same either way, and those bugs get caught consistently fast.
Race conditions are the hardest to find, averaging 5.1 years to discovery:
| Bug Type | Count | Avg Lifetime | Median |
|---|---|---|---|
| race-condition | 1,188 | 5.1 years | 2.6 years |
| integer-overflow | 298 | 3.9 years | 2.2 years |
| use-after-free | 2,963 | 3.2 years | 1.4 years |
| memory-leak | 2,846 | 3.1 years | 1.4 years |
| buffer-overflow | 399 | 3.1 years | 1.5 years |
| refcount | 2,209 | 2.8 years | 1.3 years |
| null-deref | 4,931 | 2.2 years | 0.7 years |
| deadlock | 1,683 | 2.2 years | 0.8 years |
Why do race conditions hide so long? They're non-deterministic and only trigger under specific timing conditions that might occur once per million executions. Even sanitizers like KCSAN can only flag races they observe.
30% of bugs are self-fixes where the same person who introduced the bug eventually fixed it. I guess code ownership matters.
Less fuzzing coverage. Syzkaller excels at syscall fuzzing but struggles with stateful protocols. Fuzzing netfilter effectively requires generating valid packet sequences that traverse specific connection tracking states.
Harder to trigger. Many networking bugs require:
Older code with fewer eyes. Core networking infrastructure like nf_conntrack was written in the mid-2000s. It works, so nobody rewrites it. But "stable" means fewer developers actively reviewing.
One of the oldest networking bug in my dataset was introduced in August 2006 and fixed in August 2025:
// ctnetlink_dump_table() - the buggy code path
if (res < 0) {
nf_conntrack_get(&ct->ct_general); // increments refcount
cb->args[1] = (unsigned long)ct;
break;
}
The irony: Commit d205dc40798d was itself a fix: "[NETFILTER]: ctnetlink: fix deadlock in table dumping". Patrick McHardy was fixing a deadlock by removing a _put() call. In doing so, he introduced a refcount leak that would persist for 19 years.
The bug: the code doesn't check if ct == last. If the current entry is the same as the one we already saved, we've now incremented its refcount twice but will only decrement it once. The object never gets freed.
// What should have been checked:
if (res < 0) {
if (ct != last) // <-- this check was missing for 19 years
nf_conntrack_get(&ct->ct_general);
cb->args[1] = (unsigned long)ct;
break;
}
The consequence: Memory leaks accumulate. Eventually nf_conntrack_cleanup_net_list() waits forever for the refcount to hit zero. The netns teardown hangs. If you're using containers, this blocks container cleanup indefinitely.
Why it took 19 years: You had to run conntrack_resize.sh in a loop for ~20 minutes under memory pressure. The fix commit says: "This can be reproduced by running conntrack_resize.sh selftest in a loop. It takes ~20 minutes for me on a preemptible kernel." Nobody ran that specific test sequence for two decades.
Here's a pattern I keep seeing: someone notices undefined behavior, ships a fix, but the fix doesn't fully close the hole.
Case study: netfilter set field validation
| Date | Commit | What happened |
|---|---|---|
| Jan 2020 | f3a2181e16f1 |
Stefano Brivio adds support for sets with multiple ranged fields. Introduces NFTA_SET_DESC_CONCAT for specifying field lengths. |
| Jan 2024 | 3ce67e3793f4 |
Pablo Neira notices the code doesn't validate that field lengths sum to the key length. Ships a fix. Commit message: "I did not manage to crash nft_set_pipapo with mismatch fields and set key length so far, but this is UB which must be disallowed." |
| Jan 2025 | 1b9335a8000f |
Security researcher finds a bypass. The 2024 fix was incomplete—there were still code paths that could mismatch. Real fix shipped. |
The 2024 fix was an acknowledgment that something was wrong, but Pablo couldn't find a crash, so the fix was conservative. A year later, someone found the crash.
This pattern suggests a detection opportunity: commits that say things like "this is undefined behavior" or "I couldn't trigger this but..." are flags. The author knows something is wrong but hasn't fully characterized the bug. These deserve extra scrutiny.
Looking at the bugs that survive 10+ years, I see common patterns:
1. Reference counting errors
kref_get(&obj->ref);
// ... error path returns without kref_put()
These don't crash immediately. They leak memory slowly. In a long-running system, you might not notice until months later when OOM killer starts firing.
2. Missing NULL checks after dereference
struct foo *f = get_foo();
f->bar = 1; // dereference happens first
if (!f) return -EINVAL; // check comes too late
The compiler might optimize away the NULL check since you already dereferenced. These survive because the pointer is rarely NULL in practice.
3. Integer overflow in size calculations
size_t total = n_elements * element_size; // can overflow
buf = kmalloc(total, GFP_KERNEL);
memcpy(buf, src, n_elements * element_size); // copies more than allocated
If n_elements comes from userspace, an attacker can cause allocation of a small buffer followed by a large copy.
4. Race conditions in state machines
spin_lock(&lock);
if (state == READY) {
spin_unlock(&lock);
// window here where another thread can change state
do_operation(); // assumes state is still READY
}
These require precise timing to hit. They might manifest as rare crashes that nobody can reproduce.
Every day a bug lives in the kernel is another day millions of devices are vulnerable. Android phones, servers, embedded systems, cloud infrastructure, all running kernel code with bugs that won't be found for years.
I built VulnBERT, a model that predicts whether a commit introduces a vulnerability.
Model evolution:
| Model | Recall | FPR | F1 | Notes |
|---|---|---|---|---|
| Random Forest | 76.8% | 15.9% | 0.80 | Hand-crafted features only |
| CodeBERT (fine-tuned) | 89.2% | 48.1% | 0.65 | High recall, unusable FPR |
| VulnBERT | 92.2% | 1.2% | 0.95 | Best of both approaches |
The problem with vanilla CodeBERT: I first tried fine-tuning CodeBERT directly. Results: 89% recall but 48% false positive rate (measured on the same test set). Unusable, flagging half of all commits.
Why so bad? CodeBERT learns shortcuts: "big diff = dangerous", "lots of pointers = risky". These correlations exist in training data but don't generalize. The model pattern-matches on surface features, not actual bug patterns.
The VulnBERT approach: Combine neural pattern recognition with human domain expertise.
┌─────────────────────────────────────────────────────────────────────┐
│ INPUT: Git Diff │
└───────────────────────────────┬─────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────────────┐
│ Chunked Diff Encoder │ │ Handcrafted Feature Extractor │
│ (CodeBERT + Attention) │ │ (51 engineered features) │
└─────────────┬─────────────┘ └─────────────────┬─────────────────┘
│ [768-dim] │ [51-dim]
└───────────────┬───────────────────┘
▼
┌───────────────────────────────┐
│ Cross-Attention Fusion │
│ "When code looks like X, │
│ feature Y matters more" │
└───────────────┬───────────────┘
▼
┌───────────────────────────────┐
│ Risk Classifier │
└───────────────────────────────┘
Three innovations that drove performance:
1. Chunked encoding for long diffs. CodeBERT's 512-token limit truncates most kernel diffs (often 2000+ tokens). I split into chunks, encode each, then use learned attention to aggregate:
# Learnable attention over chunks
chunk_attention = nn.Sequential(
nn.Linear(hidden_size, hidden_size // 4),
nn.Tanh(),
nn.Linear(hidden_size // 4, 1)
)
attention_weights = F.softmax(chunk_attention(chunk_embeddings), dim=1)
pooled = (attention_weights * chunk_embeddings).sum(dim=1)
The model learns which chunks matter aka the one with spin_lock without spin_unlock, not the boilerplate.
2. Feature fusion via cross-attention. Neural networks miss domain-specific patterns. I extract 51 handcrafted features using regex and AST-like analysis of the diff:
| Category | Features |
|---|---|
| Basic (4) | lines_added, lines_removed, files_changed, hunks_count |
| Memory (3) | has_kmalloc, has_kfree, has_alloc_no_free |
| Refcount (5) | has_get, has_put, get_count, put_count, unbalanced_refcount |
| Locking (5) | has_lock, has_unlock, lock_count, unlock_count, unbalanced_lock |
| Pointers (4) | has_deref, deref_count, has_null_check, has_deref_no_null_check |
| Error handling (6) | has_goto, goto_count, has_error_return, has_error_label, error_return_count, has_early_return |
| Semantic (13) | var_after_loop, iterator_modified_in_loop, list_iteration, list_del_in_loop, has_container_of, has_cast, cast_count, sizeof_type, sizeof_ptr, has_arithmetic, has_shift, has_copy, copy_count |
| Structural (11) | if_count, else_count, switch_count, case_count, loop_count, ternary_count, cyclomatic_complexity, max_nesting_depth, function_call_count, unique_functions_called, function_definitions |
The key bug-pattern features:
'unbalanced_refcount': 1, # kref_get without kref_put → leak
'unbalanced_lock': 1, # spin_lock without spin_unlock → deadlock
'has_deref_no_null_check': 0,# *ptr without if(!ptr) → null deref
'has_alloc_no_free': 0, # kmalloc without kfree → memory leak
Cross-attention learns conditional relationships. When CodeBERT sees locking patterns AND unbalanced_lock=1, that's HIGH risk. Neither signal alone is sufficient, it's the combination.
# Feature fusion via cross-attention
feature_embedding = feature_projection(handcrafted_features) # 51 → 768
attended, _ = cross_attention(
query=code_embedding, # What patterns does the code have?
key=feature_embedding, # What do the hand-crafted features say?
value=feature_embedding
)
fused = fusion_layer(torch.cat([code_embedding, attended], dim=-1))
3. Focal loss for hard examples. The training data is imbalanced where most commits are safe. Standard cross-entropy wastes gradient updates on easy examples. Focal loss:
Standard loss when p=0.95 (easy): 0.05
Focal loss when p=0.95: 0.000125 (400x smaller)
The model focuses on ambiguous commits: the hard 5% that matter.
Impact of each component (estimated from ablation experiments):
| Component | F1 Score |
|---|---|
| CodeBERT baseline | ~76% |
| + Focal loss | ~80% |
| + Feature fusion | ~88% |
| + Contrastive learning | ~91% |
| Full VulnBERT | 95.4% |
Note: Individual component impacts are approximate; interactions between components make precise attribution difficult.
The key insight: neither neural networks nor hand-crafted rules alone achieve the best results. The combination does.
Results on temporal validation (train ≤2023, test 2024):
| Metric | Target | Result |
|---|---|---|
| Recall | 90% | 92.2% ✓ |
| FPR | <10% | 1.2% ✓ |
| Precision | — | 98.7% |
| F1 | — | 95.4% |
| AUC | — | 98.4% |
What these metrics mean:
The model correctly differentiates the same bug at different stages:
| Commit | Description | Risk |
|---|---|---|
acf44a2361b8 |
Fix for UAF in xe_vfio | 12.4% LOW ✓ |
1f5556ec8b9e |
Introduced the UAF | 83.8% HIGH ✓ |
When analyzing the bug-introducing commit d205dc40798d:
- if (ct == last) {
- nf_conntrack_put(&last->ct_general); // removed!
- }
+ if (ct == last) {
+ last = NULL;
continue;
}
if (ctnetlink_fill_info(...) < 0) {
nf_conntrack_get(&ct->ct_general); // still here
Extracted features:
| Feature | Value | Signal |
|---|---|---|
get_count |
1 | nf_conntrack_get() present |
put_count |
0 | nf_conntrack_put() was removed |
unbalanced_refcount |
1 | Mismatch detected |
has_lock |
1 | Uses read_lock_bh() |
list_iteration |
1 | Uses list_for_each_prev() |
Model prediction: 72% risk: HIGH
The unbalanced_refcount feature fires because _put() was removed but _get() remains. Classic refcount leak pattern.
Dataset limitations:
Fixes: tags (~28% of fix commits). Selection bias: well-documented bugs tend to be more serious.Model limitations:
Statistical limitations:
What this means: VulnBERT is a triage tool, not a guarantee. It catches 92% of bugs with recognizable patterns. The remaining 8% and novel bug classes still need human review and fuzzing.
92.2% recall with 1.2% FPR is production-ready. But there's more to do:
The goal isn't to replace human reviewers but to point them at the 10% of commits most likely to be problematic, so they can focus attention where it matters.
The dataset extraction uses the kernel's Fixes: tag convention. Here's the core logic:
def extract_fixes_tag(commit_msg: str) -> Optional[str]:
"""Extract the commit ID from a Fixes: tag"""
pattern = r'Fixes:\s*([a-f0-9]{12,40})'
match = re.search(pattern, commit_msg, re.IGNORECASE)
return match.group(1) if match else None
# Mine all Fixes: tags from git history
git log --since="2005-04-16" --grep="Fixes:" --format="%H"
# For each fixing commit:
# - Extract introducing commit hash
# - Get dates from both commits
# - Calculate lifetime
# - Classify subsystem from file paths
Full miner code and dataset: github.com/quguanni/kernel-vuln-data
If you're working on kernel security, vulnerability detection, or ML for code analysis, I'd love to talk: jenny@pebblebed.com
Before the "rewrite it in Rust" comments take over the thread:
It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker. Rust is fantastic for memory safety, but it will not stop you from misunderstanding the spec of a network card or writing a race condition in unsafe logic that interacts with DMA.
That said, if we eliminated the 70% of bugs that are memory safety issues, the SNR ratio for finding these deep logic bugs would improve dramatically. We spend so much time tracing segfaults that we miss the subtle corruption bugs.
> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions)
While the bugs you describe are indeed things that aren't directly addressed by Rust's borrow checker, I think the article covers more ground than your comment implies.
For example, a significant portion (most?) of the article is simply analyzing the gathered data, like grouping bugs by subsystem:
Subsystem Bug Count Avg Lifetime
drivers/can 446 4.2 years
networking/sctp 279 4.0 years
networking/ipv4 1,661 3.6 years
usb 2,505 3.5 years
tty 1,033 3.5 years
netfilter 1,181 2.9 years
networking 6,079 2.9 years
memory 2,459 1.8 years
gpu 5,212 1.4 years
bpf 959 1.1 years
Or by type: Bug Type Count Avg Lifetime Median
race-condition 1,188 5.1 years 2.6 years
integer-overflow 298 3.9 years 2.2 years
use-after-free 2,963 3.2 years 1.4 years
memory-leak 2,846 3.1 years 1.4 years
buffer-overflow 399 3.1 years 1.5 years
refcount 2,209 2.8 years 1.3 years
null-deref 4,931 2.2 years 0.7 years
deadlock 1,683 2.2 years 0.8 years
And the section describing common patterns for long-lived bugs (10+ years) lists the following:> 1. Reference counting errors
> 2. Missing NULL checks after dereference
> 3. Integer overflow in size calculations
> 4. Race conditions in state machines
All of which cover more ground than listed in your comment.
Furthermore, the 19-year-old bug case study is a refcounting error not related to highly concurrent state machines or hardware assumptions.
It depends what they mean by some of these: are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).
It’s also worth noting that Rust doesn’t prevent integer overflow, and it doesn’t panic on it by default in release builds. Instead, the safety model assumes you’ll catch the overflowed number when you use it to index something (a constant source of bugs in unsafe code).
I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.
> are the state machine race conditions logic races (which Rust won’t trivially solve) or data races? If they are data races, are they the kind of ones that Rust will catch (missing atomics/synchronization) or the ones it won’t (bad atomic orderings, etc.).
The example given looks like a generalized example:
spin_lock(&lock);
if (state == READY) {
spin_unlock(&lock);
// window here where another thread can change state
do_operation(); // assumes state is still READY
}
So I don't think you can draw strong conclusions from it.> I’m bullish about Rust in the kernel, but it will not solve all of the kinds of race conditions you see in that kind of context.
Sure, all I'm trying to say is that "the class of bugs described here" covers more than what was listed in the parentheses.
The default Mutex struct in Rust makes it impossible to modify the data it protects without holding the lock.
"Each mutex has a type parameter which represents the data that it is protecting. The data can only be accessed through the RAII guards returned from lock and try_lock, which guarantees that the data is only ever accessed when the mutex is locked."
Even if used with more complex operations, the RAII approach means that the example you provided is much less likely to happen.
I'd argue, that while null ref and those classes of bugs may decrease, logic errors will increase. Rust is not an extraordinary readable language in my opinion, especially in the kernel where the kernel has its own data structures. IMHO Apple did it right in their kernel stack, they have a restricted subset of C++ that you can write drivers with.
Which is also why in my opinion Zig is much more suitable, because it actually addresses the readability aspect without bring huge complexity with it.
> I'd argue, that while null ref and those classes of bugs may decrease, logic errors will increase.
To some extent that argument only makes sense; if you can find a way to greatly reduce the incidence of non-logic bugs while not addressing other bugs then of course logic bugs would make up a greater proportion of what remains.
I think it's also worth considering the fact that while Rust doesn't guarantee that it'll catch all logic bugs, it (like other languages with more "advanced" type systems) gives you tools to construct systems that can catch certain kinds of logic bugs. For example, you can write lock types in a way that guarantees at compile time that you'll take locks in the correct order, avoiding deadlocks [0]. Another example is the typestate pattern [1], which can encode state machine transitions in the type system to ensure that invalid transitions and/or operations on invalid states are caught at compile time.
These, in turn, can lead to higher-order benefits as offloading some checks to the compiler means you can devote more attention to things the compiler can't check (though to be fair this does seem to be more variable among different programmers).
> Rust is not an extraordinary readable language in my opinion, especially in the kernel where the kernel has its own data structures.
The above notwithstanding, I'd imagine it's possible to think up scenarios where Rust would make some logic bugs more visible and others less so; only time will tell which prevails in the Linux kernel, though based on what we know now I don't think there's strong support for the notion that logic bugs in Rust are a substantially more common than they have been in C, let alone because of readability issues.
Of course there's the fact that readability is very much a personal thing and is a multidimensional metric to boot (e.g., a property that makes code readable in one context may simultaneously make code less readable in another). I don't think there would be a universal answer here.
Maybe increase as a ratio, but not absolute. There are various benefits of Rust that affect other classes of issues: fancy enums, better errors, ability to control overflow behaviour and others. But for actual experience, check out what the kernel code developer has to say: https://xcancel.com/linaasahi/status/1577667445719912450
> Zig is much more suitable, because it actually addresses the readability aspect
How? It doesn't look very different from Rust. In terms of readability Swift does stand out among LLVM frontends, don't know if it is or can be used for systems programming though.
Apple claims Swift can be used for systems programming, and is (partly) eating its own dogfood by using it in FoundationDB (https://news.ycombinator.com/item?id=38444876) and by providing examples of embedded projects (https://www.swift.org/get-started/embedded/)
I think they are right in that claim, but in making it so, at least some of the code loses some of the readability of Swift. For truly low-level code, you’ll want to give up on classes, may not want to have copy-on-write collections, and may need to add quite a few some annotations.
Swift is very slow relative to rust or c though. You can also cause seg faults in swift with a few lines. I Don't find any of these languages particularly difficult to read, so I'm not sure why this is listed as a discriminator between them.
But those segfaults will either be memory memory safe or your lines will contain “unsafe” or “unchecked” somewhere.
You can make a fully safe segfault the same way you can in go. Swapping a base reference between two child types. The data pointer and vft pointer aren't updated atomically, so a thread safety issue becomes a memory safety one.
When did that happen? Or is it something I have to turn on? I had Claude write a swift version of the go version a few months ago and it segfaulted.
Edit: Ah, the global variable I used had a warning that it isn't concurrency safe I didn't notice. So you can compile it, but if you treat warnings as errors you'd be fine.
I would argue logic errors would decrease because you aren't spending as much time worrying about and fixing null ref and other errors.
can you prove that?
Rust is a lot more explicit. I suspect logic bugs will be much less common. It's far easier to model complexity in Rust.
I would expect the opposite. C requires you to deal with extreme design complexity in large systems because the language offers nothing to help.
I don’t think that the parent comment is saying all of the bugs would have been prevented by using Rust.
But in the listed categories, I’m equally skeptical that none of them would have benefited from Rust even a bit.
That’s not my point - just that “state machine races” is a too-broad category to say much about how Rust would or wouldn’t help.
> It’s also worth noting that Rust doesn’t prevent integer overflow
Add a single line to a single file and you get that enforced.
https://rust-lang.github.io/rust-clippy/stable/index.html#ar...
Why doesn't it surprise me that the CAN bus driver bugs have the longest average lifetime?
> Furthermore, the 19-year-old bug case study is a refcounting error
It always surprised me how the top-of-the line analyzers, whether commercial or OSS, never really implemented C-style reference count checking. Maybe someone out there has written something that works well, but I haven’t seen it.
This is I think an under-appreciated aspect, both for detractors and boosters. I take a lot more “risks” with Rust, in terms of not thinking deeply about “normal” memory safety and prioritizing structuring my code to make the logic more obviously correct. In C++, modeling things so that the memory safety is super-straightforward is paramount - you’ll almost never see me store a std::string_view anywhere for example. In Rust I just put &str wherever I please, if I make a mistake I’ll know when I compile.
> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker. Rust is fantastic for memory safety, but it will not stop you from misunderstanding the spec of a network card or writing a race condition in unsafe logic that interacts with DMA.
Rust is not just about memory safety. It also have algebraic data types, RAII, among other things, which will greatly help in catching this kind of silly logic bugs.
Yeah, Rust gives you much better tools to write highly concurrent state machines than C does, and most of those tools are in the type system and not the borrow checker per se. This is exactly what the Typestate pattern (https://docs.rust-embedded.org/book/static-guarantees/typest...) is good at modeling.
The concurrent state machine example looks like a locking error? If the assumption is that it shouldn't change in the meantime, doesn't it mean the lock should continue to be held? In that case rust locks can help, because they can embed the data, which means you can't even touch it if it's not held.
It’s hilarious that you feel the need to preemptively take control of the narrative in anticipation of the Rust people that you fear so much.
Is this an irrational fear, I wonder? Reminds me of methods used in the political discourse.
People who make that kind of remarks should be called out and shunned. The Rust community is tired of discrimination and being the butt of jokes. All the other inferior languages prey on its minority status, despite Rust being able to solve all their problems. I take offense to these remarks, I don't want my kids to grow up as Rustaceans in such a caustic society.
> It’s hilarious that you feel the need to preemptively take control of the narrative in anticipation of the Rust people that you fear so much.
> Is this an irrational fear, I wonder? Reminds me of methods used in the political discourse.
In a sad sort of way, I think its hilarious that hn users have been so completely conditioned to expect rust evangelism any time a topic like this comes up that they wanted to get ahead of it.
Not sure who it says more about, but it sure does say a whole lot.
I don’t think evangelism is necessary anymore. Rust adoption is now a matter of time.
Rust feels a lot like Ruby (fancy/weird with a fanatical user base). Fil-C is a far more practical route to memory safety (a la Python in this analogy).
Rust has more features than just the borrow checker. For example, it has a a more featured type system than C or C++, which a good developer can use to detect some logic mistakes at compile time. This doesn't eliminate bugs, but it can catch some very early.
[dead]
> But unsafe Rust, which is generally more often used in low-level code, is more difficult than C and C++.
I think "is" is a bit too strong. "Can be", sure, but I'm rather skeptical that all uses of unsafe Rust will be more difficult than writing equivalent C/C++ code.
[flagged]
> race condition in unsafe logic that interacts with DMA
It's worth noting that if you write memory safe code but mis-program a DMA transfer, or trigger a bug in a PCIe device, it's possible for the hardware to give you memory-safety problems by splatting invalid data over a region that's supposed to contain something else.
I don't think 70% of bugs are memory safety issues.
In my experience it's closer to 5%.
I believe this is where that fact comes from [1]
Basically, 70% of high severity bugs are memory safety.
[1] https://www.chromium.org/Home/chromium-security/memory-safet...
70% of security vulnerabilities are due to memory safety. Not all bugs.
Using the data provided, memory safety issues (use-after-free, memory-leak, buffer-overflow, null-deref) account for 67% of their bugs. If we include refcount It is just over 80%.
That's the figure that Microsoft and Google found in their code bases.
probably quite a bit less than 5%, however, they tend to be quite serious when they happen
Only serious if you care about protecting from malicious actors running code on the same host.
you dont? I would imagine people that runs for example a browser would have quite an interest in that
Browsers are sandboxed, and working on the web browsers themselves is a very small niche, as is working on kernels.
Software increasingly runs either on dedicated infrastructure or virtual ones; in those cases there isn't really a case where you need to worry about software running on the same host trying to access the data.
Sure, it's useful to have some restrictions in place to track what needs access to what resource, but in practice they can always be circumvented for debugging or convenience of development.
Browsers are sandboxed by the kernel, and we're talking about bugs in the kernel here...
Even if modern browsers lean more on kernel features, initially the sandboxing in browsers is implemented through a managed runtime.
[dead]
I’ve seen too many embedded drivers written by well known companies not use spinlocks for data shared with an ISR.
At one point, I found serious bugs (crashing our product) that had existed for over 15 years. (And that was 10 years ago).
Rust may not be perfect but it gives me hope that some classes of stupidity will be either be avoided or made visible (like every function being unsafe because the author was a complete idiot).
> It is worth noting that the class of bugs described here (logic errors in highly concurrent state machines, incorrect hardware assumptions) wouldn't necessarily be caught by the borrow checker.
You are right about that, but even just using sum types eliminates a lot of logic errors, too.
No other top-level comments have since mentioned Rust[1] and TFA mentions neither Rust nor topics like memory safety. It’s just plain bugs.
The Rust phantom zealotry is unfortunately real.
[1] Aha, but the chilling effect of dismissing RIR comments before they are even posted...
Yes, I saw this last night and was confused because only one comment mentioned Rust, and it was deleted I think. I nearly replied "you're about to prompt 1,000 rust replies with this" and here's what I woke up to lol
Rust would prevent a number of bugs, as it can model state machine guarantees as well.
Rewriting it all in Rust is extremely expensive, so it won't be done (soon).
Expensive because of: 1/ a re-write is never easy 2/ rust is specifically tough (because it catches error and forces you to think about it for real, because it makes some contruct (linked list) really hard to implement) for kernel/close to kernel code ?
Both I'd say. Rust imposes more constraints on the structure of code than most languages. The borrow checker really likes ownership trees whereas most languages allow any ownership graph no matter how spaghetti it is.
As far as I know that's why Microsoft rewrote Typescript in Go instead of Rust.
I've been using rust for several years now and I like the way you explain the essence of the issue: tree instead of spaghetti :-)
However: https://www.reddit.com/r/typescript/comments/wbkfsh/which_pr...
so looks like it's not written in go :-)
> so looks like it's not written in go :-)
That post is three years old, before the rewrite.
I missed that. For the curious:
https://www.reddit.com/r/golang/comments/1j8shzb/microsoft_r...
When asked why go and not rust, they said: "The existing (javascript) code base makes certain assumptions -- specifically, it assumes that there is automatic garbage collection -- and that pretty much limited our choices. That heavily ruled out Rust. I mean, in Rust you have memory management, but it's not automatic; you can get reference counting or whatever you could, but then, in addition to that, there's the borrow checker and the rather stringent constraints it puts on you around ownership of data structures. In particular, it effectively outlaws cyclic data structures, and all of our data structures are heavily cyclic. "
sharp!
Thanks for raising this. It feels like evangelists paint a picture of Rust basically being magic which squashes all bugs. My personal experience is rather different. When I gave Rust a whirl a few years ago, I happened to play with mio for some reason I can't remember yet. Had some basic PoC code which didn't work as expected. So while not being a Rust expert, I am still too much fan of the scratch your own itch philosophy, so I started to read the mio source code. And after 5 minutes, I found the logic bug. Submitted a PR and moved on. But what stayed with me was this insight that if someone like me can casually find and fix a Rust library bug, propaganda is probably doing more work then expected. The Rust craze feels a bit like Java. Just because a language baby-sits the developer doesn't automatically mean better quality. At the end of the day, the dev needs to juggle the development process. Sure, tools are useful, but overstating safety is likely a route better avoided.
Rust has other features that help prevent logic errors. It's not just C plus a borrow checker.
You're fighting air
Eh... Removing concurrence bugs is one of the main selling points for Rust. And algebraic types are a really boost for situations where you have lots of assumptions.
[dead]
[dead]
Interesting! We did a similar analysis on Content Security Policy bugs in Chrome and Firefox some time ago, where the average bug-to-report time was around 3 years and 1 year, respectively. https://www.usenix.org/conference/usenixsecurity23/presentat...
Our bug dataset was way smaller, though, as we had to pinpoint all bug introductions unfortunately. It's nice to see the Linux project uses proper "Fixes: " tags.
> It's nice to see the Linux project uses proper "Fixes: " tags.
Sort of. They often don't.
Is the intention of the author to use the number of years bugs stay "hidden" as a metric of the quality of the kernel codebase or of the performance of the maintainers? I am asking because at some point the articles says "We're getting faster".
IMHO a fact that a bug hides for years can also be indication that such bug had low severity/low priority and therefore that the overall quality is very good. Unless the time represents how long it takes to reproduce and resolve a known bug, but in such case I would not say that "bug hides" in the kernel.
> IMHO a fact that a bug hides for years can also be indication that such bug had low severity/low priority
Not really true. A lot of very severe bugs have lurked for years and even decades. Heartbleed comes to mind.
The reason these bugs often lurk for so long is because they very often don't cause a panic, which is why they can be really tricky to find.
For example, use after free bugs are really dangerous. However, in most code, it's a pretty safe bet that nothing dangerous happens when use after free is triggered. Especially if the pointer is used shortly after the free and dies shortly after it. In many cases, the erroneous read or write doesn't break something.
The same is true of the race condition problems (which are some of the longest lived bugs). In a lot of cases, you won't know you have a race condition because in many cases the contention on the lock is low so the race isn't exposed. And even when it is, it can be very tricky to reproduce as the race isn't likely to be done the same way twice.
> …lurked for years and even decades. Heartbleed comes to mind.
I don’t know much about Heartbleed, but Wikipedia says:
> Heartbleed is a security bug… It was introduced into the software in 2012 and publicly disclosed in April 2014.
Two years doesn’t sound like “years or even decades” to me? But again, I don’t know much about Heartbleed so I may be missing something. It does say it was also patched in 2014, not just discovered then.
This may just be me misremembering, but as I recall, the bug of Heartbleed was ultimately a very complex macro system which supported multiple very old architectures. The bug, IIRC, was the interaction between that old macro system and the new code which is what made it hard to recognize as a bug.
Part of the resolution to the problem was I believe they ended up removing a fair number of unsupported platforms. It also ended up spawning alternatives to openssl like boring ssl which tried to remove as much as possible to guard against this very bug.
Maybe you are thinking of ShellShock
https://en.wikipedia.org/wiki/Shellshock_(software_bug)
The bug was introduced into the code in 1989, and only found and exploited in 2014.
> IMHO a fact that a bug hides for years can also be indication that such bug had low severity/low priority and therefore that the overall quality is very good.
It doesn't seem to indicate that. It indicates the bug just isn't in tested code or isn't reached often. It could still be a very severe bug.
The issue with longer lived bugs is that someone could have been leveraging it for longer.
Worst case is that it doesn't even cause correctness issues in normal use, only when misused in a way that is unlikely to happen unintentionally.
I guess because I work in security the "unintentionally" doesn't matter much to me.
But it matters for detection time, because there's a lot more "normal" use of any given piece of code than intentional attempts to break it. If a bug can't be triggered unintentionally it'll never get detected through normal use, which can lead to it staying hidden for longer.
That's not really contested? The statement was that longer detection time indicates lower severity.