Giving C a superpower: custom header file (safe

Giving C a superpower: custom header file (safe_c.h)

2025-11-1710:40269262hwisnu.bearblog.dev

Let's be honest: most people have a love-hate relationship with C. We love its raw speed, its direct connection to the metal, and its elegant simplicity. ...

Show article

09 Nov, 2025

The story of how I wrote a leak-free, thread-safe grep in C23 without shooting yourself in the foot, and how you can too!

Introduction

Let's be honest: most people have a love-hate relationship with C. We love its raw speed, its direct connection to the metal, and its elegant simplicity. But we hate its footguns, its dragons, the untamed beasts. The segfaults that appear from nowhere, the memory leaks that slowly drain the life from our applications, and the endless goto cleanup; chains that make our code look like a plate of spaghetti pasta.

This is the classic C curse: power without guardrails...at least that's the fear mongering mantra being said again and again. But is that still relevant in today's world with all the tools available for C devs like static analyzer and dynamic sanitizers? I've written about this here and here.

What if, with the help of the modern tools and a custom header file (600 loc), you could tame those footguns beasts? What if you could keep C's power but wrap it in a suit of modern armor? That's what the custom header file safe_c.h is for. It's designed to give C some safety and convenience features from C++ and Rust, and I'm using it to build a high-performance grep clone called cgrep as my test case.

By the end this article I hope it could provide the audience with the idea of C is super flexible and extensible, sort of "do whatever you want with it" kind of thing. And this is why C (and its close cousin: Zig) remain to be my favorite language to write programs in; it's the language of freedom!

safe_c.h

Is a custom C header file that takes features mainly from C++ and Rust and implements them into our C code ~ [write C code, get C++ and Rust features!]

It starts by bridging the gap between old and new C. C23 gave us [[cleanup]] attributes, but in the real world, you need code that compiles on GCC 11 or Clang 18. safe_c.h detects your compiler and gives you the same RAII semantics everywhere. No more #ifdef soup.

// The magic behind CLEANUP: zero overhead, maximum safety
#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L
#define CLEANUP(func) [[cleanup(func)]]
#else
#define CLEANUP(func) __attribute__((cleanup(func)))
#endif

// Branch prediction that actually matters in hot paths
#ifdef __GNUC__
#define LIKELY(x)   __builtin_expect(!!(x), 1)
#define UNLIKELY(x) __builtin_expect(!!(x), 0)
#else
#define LIKELY(x)   (x)
#define UNLIKELY(x) (x)
#endif

Your cleanup code runs even if you return early, goto out, or panic. It's finally, but for C.

The Memory Management Beast: Slain with Smart Pointers (C++ feature)

The oldest, fiercest and most feared by devs: manual memory management.

Before: the highway path to leaks.
Forgetting a single free() is a disaster. In cgrep, parsing command-line options the old way is a breeding ground for CVEs and its bestiary. You have to remember to free the memory on every single exit path, difficult for the undisciplined.

// The Old Way (don't do this)
char* include_pattern = NULL;
if (optarg) {
 include_pattern = strdup(optarg);
}
// ...200 lines later...
if (some_error) {
 if (include_pattern) free(include_pattern); // Did I free it? Did I??
 return 1;
}
// And remember to free it at *every* return path...

After: memory that automatically cleans itself up.
UniquePtr is a "smart pointer" that owns a resource. When the UniquePtr variable goes out of scope, its resource is automatically freed. It's impossible to forget.

Here's the machinery inside safe_c.h:

// The UniquePtr machinery: a struct + automatic cleanup
typedef struct {
    void* ptr;
    void (*deleter)(void*);
} UniquePtr;

#define AUTO_UNIQUE_PTR(name, ptr, deleter) \
    UniquePtr name CLEANUP(unique_ptr_cleanup) = UNIQUE_PTR_INIT(ptr, deleter)

static inline void unique_ptr_cleanup(UniquePtr* uptr) {
    if (uptr && uptr->ptr && uptr->deleter) {
        uptr->deleter(uptr->ptr);
        uptr->ptr = NULL;
    }
}

And here's how cgrep uses it. The cleanup is automatic, even if errors happen:

// In cgrep, we use this for command-line arguments
AUTO_UNIQUE_PTR(include_pattern_ptr, NULL, options_string_deleter); // When we get a new pattern, the old one is automatically freed!
unique_ptr_delete(&include_pattern_ptr);
include_pattern_ptr.ptr = strdup(optarg);
// No leaks, even if an error happens later!

Sharing Safely with SharedPtr

Before: manual, bug-prone reference counting.
You'd have to implement reference counting by hand, creating a complex and fragile system where a single mistake leads to a leak or a use-after-free bug.

// The old way of manual reference counting
typedef struct {
 MatchStore* store;
 int ref_count;
 pthread_mutex_t mutex;
} SharedStore; void release_store(SharedStore* s) {
 pthread_mutex_lock(&s->mutex);
 s->ref_count--;
 bool is_last = (s->ref_count == 0);
 pthread_mutex_unlock(&s->mutex);  if (is_last) {
 match_store_deleter(s->store);
 free(s);
 }
}

After: automated reference counting.
SharedPtr automates this entire process. The last thread to finish using the object automatically triggers its destruction. The machinery:

// The SharedPtr machinery: reference counting without the boilerplate
typedef struct {
    void* ptr;
    void (*deleter)(void*);
    size_t* ref_count;
} SharedPtr;

#define AUTO_SHARED_PTR(name) \
    SharedPtr name CLEANUP(shared_ptr_cleanup) = {.ptr = NULL, .deleter = NULL, .ref_count = NULL}

static inline void shared_ptr_cleanup(SharedPtr* sptr) {
    shared_ptr_delete(sptr); // Decrement and free if last reference
}

The usage is clean and safe. No more manual counting.

// In our thread worker context, multiple threads access the same results store
typedef struct {
 // ...
 SharedPtr store; // No more worrying about who frees this!
 SharedPtr file_counts;
 // ...
} FileWorkerContext; // In main(), we create it once and share it safely
// SharedPtr: Reference-counted stores for thread-safe sharing
SharedPtr store_shared = {0};
shared_ptr_init(&store_shared, store_ptr.ptr, match_store_deleter);
// Pass to threads: ctx->store = shared_ptr_copy(&store_shared);
// ref-count increments automatically; last thread out frees it.

The Buffer Overflow Beast: Contained with Vectors and Views (C++ feature)

Dynamically growing arrays in C is a horror show.

Before: the realloc dance routine.
You have to manually track capacity and size, and every realloc risks fragmenting memory or failing, requiring careful error handling for every single element you add.

// The old way: manual realloc is inefficient and complex
MatchEntry** matches = NULL;
size_t matches_count = 0;
size_t matches_capacity = 0; for (/*...each match...*/) {
 if (matches_count >= matches_capacity) {
 matches_capacity = (matches_capacity == 0) ? 8 : matches_capacity * 2;
 MatchEntry** new_matches = realloc(matches, matches_capacity * sizeof(MatchEntry*));
 if (!new_matches) {
 free(matches); // Don't leak!
 /* handle error */
 }
 matches = new_matches;
 }
 matches[matches_count++] = current_match;
}

After: a type-safe, auto-growing vector.
safe_c.h generates an entire type-safe vector for you. It handles allocation, growth, and cleanup automatically. The magic that generates the vector:

// The magic that generates a complete vector type from a single line
#define DEFINE_VECTOR_TYPE(name, type) \
    typedef struct { \
        Vector base; \
        type* data; \
    } name##Vector; \
    \
    static inline bool name##_vector_push_back(name##Vector* vec, type value) { \
        bool result = vector_push_back(&vec->base, &value); \
        vec->data = (type*)vec->base.data; /* Sync pointer after potential realloc */ \
        return result; \
    } \
    \
    static inline bool name##_vector_reserve(name##Vector* vec, size_t new_capacity) { \
        bool result = vector_reserve(&vec->base, new_capacity); \
        vec->data = (type*)vec->base.data; /* Sync pointer after potential realloc */ \
        return result; \
    } \


    /* more helper functions not outlined here */

// And the underlying generic Vector implementation
typedef struct {
    size_t size;
    size_t capacity;
    void* data;
    size_t element_size;
} Vector;

Using it in cgrep is simple and safe. The vector cleans itself up when it goes out of scope.

// Type-safe vector for collecting matches
DEFINE_VECTOR_TYPE(MatchEntryPtr, MatchEntry*) AUTO_TYPED_VECTOR(MatchEntryPtr, all_matches_vec);
MatchEntryPtr_vector_reserve(&all_matches_vec, store->total_matches); // Pushing elements is safe and simple
for (MatchEntry* entry = store->buckets[i]; entry; entry = entry->next) {
 MatchEntryPtr_vector_push_back(&all_matches_vec, entry);
}

Views: Look, Don't Touch (or malloc) - C++ feature

Before: needless allocations.
To handle a substring or a slice of an array, you'd often malloc a new buffer and copy the data into it, which is incredibly slow in a tight loop.

// The old way: allocating a new string just to get a substring
const char* line = "this is a long line of text";
char* pattern = "long line";
// To pass just the pattern to a function, you might do this:
char* sub = malloc(strlen(pattern) + 1);
strncpy(sub, pattern, strlen(pattern) + 1);
// ... use sub ...
free(sub); // And hope you remember this free call

After: zero-cost, non-owning views.
A StringView or a Span is just a pointer and a length. It's a non-owning reference that lets you work with slices of data without any allocation. The definitions are pure and simple:

// The StringView and Span definitions: pure, simple, zero-cost
typedef struct {
 const char* data;
 size_t size;
} StringView; typedef struct {
 void* data;
 size_t size;
 size_t element_size;
} Span;

In cgrep, the search pattern becomes a StringView, avoiding allocation entirely.

// Our options struct holds a StringView, not a char*
typedef struct {
 StringView pattern; // Clean, simple, and safe
 // ...
} GrepOptions; // Initializing it is a piece of cake
options.pattern = string_view_init(argv[optind]);

For safe array access, Span provides a bounds-checked window into existing data.

// safe_c.h
#define DEFINE_SPAN_TYPE(name, type) \
    typedef struct { \
        type* data; \
        size_t size; \
    } name##Span; \
    \
    static inline name##Span name##_span_init(type* data, size_t size) { \
        return (name##Span){.data = data, .size = size}; \
    } \
    \

    /* other helper functions not outlined here */

// Span: Type-safe array slices for chunk processing
DEFINE_SPAN_TYPE(LineBuffer, char)
LineBufferSpan input_span = LineBuffer_span_init((char*)start, len); for (size_t i = 0; i < LineBuffer_span_size(&input_span); i++) {
 char* line = LineBuffer_span_at(&input_span, i); // asserts i < span.size
}

The Error-Handling `goto` Beast: Replaced with Results (Rust feature) and RAII (C++ feature)

C's error handling is notoriously messy.

Before: goto cleanup spaghetti carbonara.
Functions return special values like -1 or NULL, and you have to check errno. This leads to deeply nested if statements and a single goto cleanup; label that has to handle every possible failure case.

// The old way: goto cleanup
int do_something(const char* path) {
 int fd = open(path, O_RDONLY);
 if (fd < 0) {
 return -1; // Error
 }  void* mem = malloc(1024);
 if (!mem) {
 close(fd); // Manual cleanup
 return -1;
 }
 
 // ... do more work ...  free(mem);
 close(fd);
 return 0; // Success
}

After: explicit, type-safe result.
Inspired by Rust, Result forces you to handle errors explicitly by returning a type that is either a success value or an error value. The Result machinery:

// The Result type machinery: tagged unions for success/failure
typedef enum { RESULT_OK, RESULT_ERROR } ResultStatus;

#define DEFINE_RESULT_TYPE(name, value_type, error_type) \
    typedef struct { \
        ResultStatus status; \
        union { \
            value_type value; \
            error_type error; \
        }; \
    } Result##name;

Handling errors becomes easy. You can't accidentally use an error as a valid value.

// Define a Result for file operations
DEFINE_RESULT_TYPE(FileOp, i32, const char*) // Our function now returns a clear Result
static ResultFileOp submit_stat_request_safe(...) {
 // ...
 if (!sqe) {
 return RESULT_ERROR(FileOp, "Could not get SQE for stat");
 }
 return RESULT_OK(FileOp, 0);
} // And handling it is clean
ResultFileOp result = submit_stat_request_safe(path, &ring, &pending_ops);
if (!RESULT_IS_OK(result)) {
 fprintf(stderr, "Error: %s\n", RESULT_UNWRAP_ERROR(result));
}

This is powered by RAII. The CLEANUP attribute ensures resources are freed no matter how a function exits.

#define AUTO_MEMORY(name, size) \
 void* name CLEANUP(memory_cleanup) = malloc(size) // DIR pointers are automatically closed, even on an early return.
DIR* dir CLEANUP(dir_cleanup) = opendir(req->path);
if (!dir) {
 return RESULT_ERROR(FileOp, "Failed to open dir"); // dir_cleanup is NOT called
}
if (some_condition) {
 return RESULT_OK(FileOp, 0); // closedir() is called automatically HERE!
}

The Assumption Beast: Challenged with Contracts and Safe Strings

Before: assert() and pray.
A standard assert(ptr != NULL) is good, but when it fails, the message is generic. You know the condition failed, but not the context or why it was important.

After: self-documenting contracts.
requires() and ensures() make function contracts explicit. The failure messages tell you exactly what went wrong. The contract macros:

#define requires(cond) assert_msg(cond, "Precondition failed")
#define ensures(cond) assert_msg(cond, "Postcondition failed")

#define assert_msg(cond, msg) /* ... full implementation ... */

This turns assertions into executable documentation:

// Preconditions that document and enforce contracts
static inline bool arena_create(Arena* arena, size_t size)
{
 requires(arena != NULL); // Precondition: arena must not be null
 requires(size > 0); // Precondition: size must be positive
 
 // ... implementation ...
 
 ensures(arena->buffer != NULL); // Postcondition: buffer is allocated
 ensures(arena->size == size); // Postcondition: size is set correctly
 
 return true;
}

`strcpy()` is a Security Vulnerability

Before: buffer overflows.
strcpy has no bounds checking. It's the source of countless security holes. strncpy is little better, as it might not null-terminate the destination string.

// The old, dangerous way
char dest[20];
const char* src = "This is a very long string that will overflow the buffer";
strcpy(dest, src); // Undefined behavior! Stack corruption!

After: safe, bounds-checked operations.
safe_c.h provides alternatives that check bounds and return a success/failure status. No surprises. The safe implementation:

// The safe string operations: bounds checking that can't be ignored
static inline bool safe_strcpy(char* dest, size_t dest_size, const char* src) {
 if (!dest || dest_size == 0 || !src) return false;
 size_t src_len = strlen(src);
 if (src_len >= dest_size) return false;
 memcpy(dest, src, src_len + 1);
 return true;
}

In cgrep, this prevents path buffer overflows cleanly:

// Returns bool, not silent truncation
if (!safe_strcpy(req->path, PATH_MAX, path)) {
 free(req);
 return RESULT_ERROR(FileOp, "Path is too long");
}

Concurrency: Mutexes That Unlock Themselves (Rust feature)

Before: leaked locks and deadlocks.
Forgetting to unlock a mutex, especially on an error path, is a catastrophic bug that causes your program to deadlock.

// The Buggy Way
pthread_mutex_lock(&mutex);
if (some_error) {
 return; // Oops, mutex is still locked! Program will deadlock.
}
pthread_mutex_unlock(&mutex);

After: RAII-based locks.
Using the same CLEANUP attribute, we can ensure a mutex is always unlocked when the scope is exited. This bug becomes impossible to write.

// With a cleanup function, unlocking is automatic.
void mutex_unlock_cleanup(pthread_mutex_t** lock) {
 if (lock && *lock) pthread_mutex_unlock(*lock);
} // RAII lock guard via cleanup attribute
pthread_mutex_t my_lock;
pthread_mutex_t* lock_ptr CLEANUP(mutex_unlock_cleanup) = &my_lock;
pthread_mutex_lock(lock_ptr); if (some_error) {
 return; // Mutex is automatically unlocked here!
}

Simple wrappers also clean up the boilerplate of managing threads:

// The concurrency macros: spawn and join without boilerplate
#define SPAWN_THREAD(name, func, arg) \
    thrd_t name; \
    thrd_create(&name, (func), (arg))

#define JOIN_THREAD(name) \
    thrd_join(name, NULL)

And in cgrep:

// Thread pool spawn without boilerplate
SPAWN_THREAD(workers[i], file_processing_worker, &contexts[i]);
JOIN_THREAD(workers[i]); // No manual pthread_join() error handling

Performance: Safety at -O2, Not -O0

Safety doesn't mean slow. The UNLIKELY() macro tells the compiler which branches are cold, adding zero overhead in hot paths.

#ifdef __GNUC__
#define LIKELY(x)   __builtin_expect(!!(x), 1)
#define UNLIKELY(x) __builtin_expect(!!(x), 0)
#else
#define LIKELY(x)   (x)
#define UNLIKELY(x) (x)
#endif

The real win is in the fast paths:

// In hot allocation path: branch prediction
if (UNLIKELY(store->local_buffer_sizes[thread_id] >= LOCAL_BUFFER_CAPACITY)) {
 match_store_flush_buffer(store, thread_id); // Rarely taken
} // In match checking: likely path first
if (!options->case_insensitive && options->fixed_string) {
 // Most common case: fast path with no branches
 const char* result = strstr(line, options->pattern.data);
 return result != NULL;
}

The above is similar to what a PGO (Profile Guided Optimization) would have.

This is what main() looks like when you stop fighting the language:

int main(int argc, char* argv[]) {
 initialize_simd();
 output_buffer_init(); // Auto-cleanup on exit
 
 GrepOptions options = {0};
 AUTO_UNIQUE_PTR(include_pattern_ptr, NULL, options_string_deleter);
 
 // ... parse args with getopt_long ...
 
 AUTO_UNIQUE_PTR(store_ptr, NULL, match_store_deleter);
 SharedPtr store_shared = {0};
 if (need_match_store) {
 store_ptr.ptr = malloc(sizeof(ConcurrentMatchStore));
 if (!store_ptr.ptr || !match_store_create(store_ptr.ptr, hash_capacity, 1000)) {
 return 1; // All allocations cleaned up automatically
 }
 shared_ptr_init(&store_shared, store_ptr.ptr, match_store_deleter);
 }
 
 // Process files with thread pool...
 
cleanup: // Single cleanup label needed -- RAII handles the rest
 output_buffer_destroy(); // Flushes and destroys
 return 0;
}

Conclusion

In the end, cgrep is 2,300 lines of C. Without safe_c.h, it would have required over 50 manual free() calls ~ a recipe for leaks and segfaults. With the custom header file, it's 2,300 lines that compile to the same assembly, run just as fast, and are fundamentally safer.

This proves that the best abstraction is the one you don't pay for and can't forget to use. It enables a clear and powerful development pattern: validate inputs at the boundary, then unleash C's raw speed on the core logic. You get all the power of C without the infamous self-inflicted footgun wounds.

C simplicity makes writing programs with it becomes fun, however there are ways to make it both fun and safe..just like using condoms, you know?

This post has gotten too long for comfort, but I have one final food for thought for you the readers: after all these guard rails, what do you think of cgrep's performance? Check the screenshots below:

grep bench on recursive directories
grep bench on single large file NOTE: make sure you check the memory usage comparison between cgrep and ripgrep

In the next article, I will discuss how I built cgrep, the design I chose for it, why and how cgrep managed to be a couple of times faster than ripgrep (more than 2x faster in the recursive directory bench) while being super efficient with resource usage (20x smaller memory footprint in the single large file bench).

It's gonna be a lot of fun! Cheers!

If you enjoyed this post, click the little up arrow chevron on the bottom left of the page to help it rank in Bear's Discovery feed and if you got any questions or anything, please use the comments section.

Read the original article

Comments

By woodruffw 2025-11-1713:3013 reply

Intentionally or not, this post demonstrates one of the things that makes safer abstractions in C less desirable: the shared pointer implementation uses a POSIX mutex, which means it’s (1) not cross platform, and (2) pays the mutex overhead even in provably single-threaded contexts. In other words, it’s not a zero-cost abstraction.

C++’s shared pointer has the same problem; Rust avoids it by having two types (Rc and Arc) that the developer can select from (and which the compiler will prevent you from using unsafely).

By kouteiheika 2025-11-1713:422 reply

> the shared pointer implementation uses a POSIX mutex [...] C++’s shared pointer has the same problem

It doesn't. C++'s shared pointers use atomics, just like Rust's Arc does. There's no good reason (unless you have some very exotic requirements, into which I won't get into here) to implement shared pointers with mutexes. The implementation in the blog post here is just suboptimal.

(But it's true that C++ doesn't have Rust's equivalent of Rc, which means that if you just need a reference counted pointer then using std::shared_ptr is not a zero cost abstraction.)

By woodruffw 2025-11-1713:451 reply

To be clear, the “same problem” is that it’s not a zero-cost abstraction, not that it uses the same specific suboptimal approach as this blog post.

By kouteiheika 2025-11-1713:511 reply

I think that's an orthogonal issue. It's not that C++'s shared pointer is not a zero cost abstraction (it's as much a zero cost abstraction as in Rust), but that it only provides one type of a shared pointer.

But I suppose we're wasting time on useless nitpicking. So, fair enough.

By woodruffw 2025-11-1713:593 reply

I think they’re one and the same: C++ doesn’t have program-level thread safety by construction, so primitives like shared pointers need to be defensive by default instead of letting the user pick the right properties for their use case.

Edit: in other words C++ could provide an equivalent of Rc, but we’d see no end of people complaining when they shoot themselves in the foot with it.

(This is what “zero cost abstraction” means: it doesn’t mean no cost, just that the abstraction’s cost is no greater than the semantically equivalent version written by the user. So both Arc and shared_ptr are zero-cost in a MT setting, but only Rust has a zero-cost abstraction in a single-threaded setting.)

By kouteiheika 2025-11-1714:161 reply

I can't say I agree with this? If C++ had an Rc equivalent (or if you'd write one yourself) it would be just as zero cost as it is in Rust, both in a single-threaded setting and in a multithreaded-setting. "Zero cost abstraction" doesn't mean that it cannot be misused or that it doesn't have any cognitive overhead to use correctly, just that it matches whatever you'd write without the abstraction in place. Plenty of "zero cost" features in C++ still need to you pay attention to not accidentally blow you leg off.

Simply put, just as a `unique_ptr` (`Box`) is an entirely different abstraction than `shared_ptr` (`Arc`), an `Rc` is also an entirely different abstraction than `Arc`, and C++ simply happens to completely lack `Rc` (at least in the standard; Boost of course has one). But if it had one you could use it with exactly the same cost as in Rust, you'd just have to manually make sure to not use it across threads (which indeed is easier said than done, which is why it's not in the standard), exactly the same as if you'd manually maintain the reference count without the nice(er) abstraction. Hence "zero cost abstraction".

By woodruffw 2025-11-1714:28

Sorry, I realized I’m mixing two things in a confusing way: you’re right that C++ could easily have a standard zero-cost Rc equivalent; I’m saying that it can’t have a safe one. I think this is relevant given the weight OP gives to both performance and safety.

By SR2Z 2025-11-1719:411 reply

Isn't the point of using atomics that there is virtually no performance penalty in single threaded contexts?

IMO "zero cost abstraction" just means "I have a slightly less vague idea of what this will compile to."

By SkiFire13 2025-11-1719:461 reply

No, atomics do have a performance penality compared to the equivalent single threaded code due to having to fetch/flush the impacted cache lines in the eventuality that another thread is trying to atomically read/write the same memory location at the same time.

By CyberDildonics 2025-11-1722:072 reply

Atomics have almost no impact when reading, which is what would happen in a shared pointer the vast majority of the time.

By woodruffw 2025-11-1722:20

> which is what would happen in a shared pointer the vast majority of the time.

This seems workload dependent; I would expect a lot of workloads to be write-heavy or at least mixed, since copies imply writes to the shared_ptr's control block.

By oconnor663 2025-11-1723:17

I think it's pretty rare to do a straight up atomic load of a refcount. (That would be the `use_count` method in C++ or the `strong_count` method in Rust.) More of the time you're doing either a fetch-add to copy the pointer or a fetch-sub to destroy your copy, both of which involve stores. Last I heard the fetch-add can use the "relaxed" atomic ordering, which should make it very cheap, but the fetch-sub needs to use the "release" ordering, which is where the cost comes in.

By groundzeros2015 2025-11-182:031 reply

> C++ doesn’t have program-level thread safety by construction

It does. It’s called a process.

Everyone chose convenience and micro-benchmarks by choosing threads instead.

By woodruffw 2025-11-183:131 reply

"Thread truther" is not one of the arguments I had on the bingo card for this conversation.

By groundzeros2015 2025-11-183:59

I guessed as much. I’m not alone - there is a whole chapter on this topic in “The art of UNIX programming”.

By cogman10 2025-11-1713:472 reply

> very exotic requirements

I'd be interested to know what you are thinking.

The primary exotic thing I can imagine is an architecture lacking the ability to do atomic operations. But even in that case, C11 has atomic operations [1] built in. So worst case, the C library for the target architecture would likely boil down to mutex operations.

[1] https://en.cppreference.com/w/c/atomic.html

By kouteiheika 2025-11-1714:391 reply

Well, basically, yeah, if your platform lacks support for atomics, or if you'd need some extra functionality around the shared pointer like e.g. logging the shared pointer refcounts while enforcing consistent ordering of logs (which can be useful if you're unfortunate enough to have to debug a race condition where you need to pay attention to refcounts, assuming the extra mutex won't make your heisenbug disappear), or synchronizing something else along with the refcount (basically a "fat", custom shared pointer that does more than just shared-pointering).

By colonwqbang 2025-11-1715:593 reply

Does there exist any platform which has multithreading but not atomics? Such a platform would be quite impractical as you can't really implement locks or any other threading primitive without atomics.

By addaon 2025-11-1717:361 reply

> Does there exist any platform which has multithreading but not atomics?

Yes. Also, almost every platform I know that supports multi threading and atomics doesn’t support atomics between /all/ possible masters. Consider a microcontroller with, say, two Arm cores (multithreaded, atomic-supporting) and a DMA engine.

By lpribis 2025-11-1722:31

Yes but "atomic" operations with the DMA engine are accomplished through interrupts (atomic) or memory mapped IO configuration (atomic).

By cogman10 2025-11-1716:182 reply

Certainly such systems can pretty readily exist. You merely need atomic reads/writes in order to implement locks.

You can't create userspace locks which is a bummer, but the OS has the capability of enforcing locks. That's basically how early locking worked.

The main thing needed to make a correct lock is interrupt protection. Something every OS has.

To go fast, you need atomic operations. It especially becomes important if you are dealing with multiple cores. However, for a single core system atomics aren't needed for the OS to create locks.

By SkiFire13 2025-11-1719:51

> You merely need atomic reads/writes in order to implement locks.

Nit: while it's possible to implement one with just atomic reads and writes, it's generally not trivial/efficient/ergonomic to do so without an atomic composite read-write operation, like a compare-and-swap.

By colonwqbang 2025-11-1717:131 reply

I wrote "multithreaded" but I really meant "multicore". If two cores are contending for a lock I don't see how irq protection help. As long as there is only one core, I agree.

By cogman10 2025-11-1718:051 reply

On most multicore systems you can pin the IRQ handling to a single core. Pinning locking interrupts to a single core would be how you handle this.

By colonwqbang 2025-11-1912:29

True, but locks are not only needed inside IRQ handler routines.

By oconnor663 2025-11-1723:22

The boring answer is that standard atomics didn't exist until C++11, so any compiler older than that didn't support them. I think most platforms (certainly the popular desktop/server platforms) had ways to accomplish the same thing, but that was up to the vendor, and it might not've been well documented or stable. Infamously, `volatile` used to be (ab)used for this a lot before we had proper standards. (I think it still has some atomic-ish properties in MSVC?)

By goalieca 2025-11-1715:081 reply

Which platforms might that be? Even MIPS has atomics (at least pointer sized last i checked).

By cogman10 2025-11-1716:031 reply

AFIAK, and I'm not MIPS expert, but I believe it doesn't have the ability to add a value directly to a memory address. You have to do something like

    // Not real MIPS, just what I've gleaned from a brief look at some docs
    LOAD addr, register
    ADD 1, register
    STORE register, addr

The LOAD and STORE are atomic, but the `ADD` happens out of band.

That's a problem if any sort of interrupt happens (if you are multi-threading then a possibility). If it happens at the load, then a separate thread can update "addr" which mean the later STORE will stomp on what's there.

x86 and ARM can do

    ADD 1, addr

as well as other instructions like "compare and swap"

    LOAD addr, register
    MOV register, register2
    ADD 1, register2
    COMPARE_AND_SWAP addr, register, register2
    if (cas_failed) { try again }

By unnah 2025-11-1717:02

On MIPS you can simulate atomics with a load-linked/store-conditional (LL/SC) loop. If another processor has changed the same address between the LL and SC instructions, the SC fails to store the result and you have to retry. The underlying idea is that the processors would have to communicate memory accesses to each other via the cache coherence protocol anyway, so they can easily detect conflicting writes between the LL and SC instructions. It gets more complicated with out-of-order execution...

    loop: LL r2, (r1)
          ADD r3, r2, 1
          SC r3, (r1)
          BEQ r3, 0, loop
          NOP

By accelbred 2025-11-1716:493 reply

Unfortunately, for C++, thats not true. At least with glibc and libstdc++, if you do not link with pthreads, then shared pointers are not thread-safe. At runtime it will do a symbol lookup for a pthreads symbol, and based off the result, the shared pointer code will either take the atomic or non-atomic path.

I'd much rather it didnt try to be zero-cost and it always used atomics...

By TuxSH 2025-11-1718:421 reply

True, but that's a fault of the implementation, which assumes POSIX is the only thing in town & makes questionable optimization choices, rather that of the language itself

(for reference, the person above is referring to what's described here: https://snf.github.io/2019/02/13/shared-ptr-optimization/)

By wyldfire 2025-11-1719:461 reply

> the language itself

The "language" is conventionally thought of as the sum of the effects given by the { compiler + runtime libraries }. The "language" often specifies features that are implemented exclusively in target libraries, for example. You're correct to say that they're not "language features" but the two domains share a single label like "C++20" / "C11" - so unless you're designing the toolchain it's not as significant a difference.

We're down to ~three compilers: gcc, clang, MSVC and three corresponding C++ libraries.

By TuxSH 2025-11-1911:02

I agree with what you said, however neither libc++ nor MS-STL have this "optimization" to my knowledge

By woodruffw 2025-11-1717:08

This is, impressively, significantly worse than I realized!

By eddd-ddde 2025-11-1719:111 reply

Why use atomics if you don't need them? There really should just be two different shared pointer types.

By accelbred 2025-11-1722:56

I wouldn't mind two types. I mind shared pointer not using atomics if I statically link pthreads and dlload a shared lib with them, or if Im doing clone3 stuff. Ive had multiple situations in which the detection method would turn off atomic use when it actually needs to be atomic.

By spacedcowboy 2025-11-1713:463 reply

The number of times I might want to write something in C and have it less likely to crash absolutely dwarfs the number of times I care about that code being cross-platform.

Sure, cross-platform is desirable, if there's no cost involved, and mandatory if you actually need it, but it's a "nice to have" most of the time, not a "needs this".

As for mutex overheads, yep, that's annoying, but really, how annoying ? Modern CPUs are fast. Very very fast. Personally I'm far more likely to use an os_unfair_lock_t than a pthread_mutex_t (see the previous point) which minimizes the locking to a memory barrier, but even if locking were slow, I think I'd prefer safe.

Rust is, I'm sure, great. It's not something I'm personally interested in getting involved with, but it's not necessary for C (or even this extra header) to do everything that Rust can do, for it to be an improvement on what is available.

There's simply too much out there written in C to say "just use Rust, or Swift, or ..." - too many libraries, too many resources, too many tutorials, etc. You pays your money and takes your choice.

By woodruffw 2025-11-1714:02

That’s all reasonable, but here’s one of the primary motivations from the post:

> We love its raw speed, its direct connection to the metal

If this is a strong motivating factor (versus, say, refactoring risk), then C’s lack of safe zero-cost abstractions is a valid concern.

By lelanthran 2025-11-1716:18

> As for mutex overheads, yep, that's annoying, but really, how annoying ?

For this use-case, you might not notice. ISTR, when examing the pthreads source code for some platform, that mutexes only do a context switch as a fallback, if the lock cannot be acquired.

So, for most use-cases of this header, you should not see any performance impact. You'll see some bloat, to be sure.

By lmm 2025-11-181:331 reply

> There's simply too much out there written in C to say "just use Rust, or Swift, or ..." - too many libraries, too many resources, too many tutorials, etc.

There really isn't. Speaking as someone who works in JVM-land, you really can avoid C all the time if you're willing to actually try.

By spacedcowboy 2025-11-199:12

shrug horses for courses. I’m at that wonderful stage of life where I only code what I want to, I don’t have people telling me what to do. I’m not going to throw away decades of code investment for some principle that I don’t really care about - if I did care more, I’d probably be more invested in rust after all.

Plus, a lot of what I do is on microcontrollers with tens of kilobytes of RAM, not big-iron massively parallel servers where Java is commonly used. The vendor platform libraries are universally provided in C, so unless you want to reimplement the SPI or USB handler code, and probably write the darn rust implementation/Java virtual machine, and somehow squeeze it all in, then no, you can’t really avoid C.

Or assembler for that matter, interrupt routines often need assembly language to get latency down, and memory management (use this RAM address range because it’s “TCM” 1-clock latency, otherwise it’s 5 or 6 clocks and everything breaks…)

By lelanthran 2025-11-1716:14

> Intentionally or not, this post demonstrates one of the things that makes safer abstractions in C less desirable: the shared pointer implementation uses a POSIX mutex, which means it’s (1) not cross platform, and (2) pays the mutex overhead even in provably single-threaded contexts. In other words, it’s not a zero-cost abstraction.

It's an implementation detail. They could have used atomic load/store (since c11) to implement the increment/decrement.

TBH I'm not sure what a mutex buys you in this situation (reference counting)

By saurik 2025-11-1714:082 reply

I'd think a POSIX mutex--a standard API that I not only could implement anywhere, but which has already been implemented all over the place--is way more "cross platform" than use of atomics.

By woodruffw 2025-11-1714:322 reply

To lift things up a level: I think a language’s abstractions have failed if we even need to have a conversation around what “cross platform” really means :-)

By jhatemyjob 2025-11-1720:571 reply

If that's the bar, what language's abstractions haven't failed?

By ruined 2025-11-182:221 reply

wasm and lambda calculus

By wat10000 2025-11-1719:32

If you're targeting a vaguely modern C standard, atomics win by being part of the language. C11 has atomics and it's straightforward to use them to implement thread-safe reference counting.

By aidenn0 2025-11-1716:431 reply

> the shared pointer implementation uses a POSIX mutex

Do you have a source for this? I couldn't find the implementation in TFA nor a link to safe_c.h

By layer8 2025-11-1717:32

The shared-pointer implementation isn’t actually shown (i.e. shared_ptr_copy), and the SharedPtr type doesn’t use a pthread_mutex_t.

By kev009 2025-11-1717:131 reply

C11 has a mutex API (threads.h), so why would it rely on POSIX? Are you sure it's not an runtime detail on one platform? https://devblogs.microsoft.com/cppblog/c11-threads-in-visual...

By loeg 2025-11-1722:34

The article has an excerpt using posix mutexes specifically. But you're right that C11 code can just portably use standard mutexes.

  // The old way of manual reference counting
  typedef struct {
      MatchStore* store;
      int ref_count;
      pthread_mutex_t mutex;
  } SharedStore;

By nurettin 2025-11-193:54

Rust pays the cumbersome lifetime syntax tax even in provably single threaded contexts. When will Rust develop ergonomics with better defaults and less boilerplate in such contexts?

By kazinator 2025-11-182:09

ISO C has had mutexes since C11 I think.

In any case, you could use the provided primitives to wrap the C11 mutex, or any other mutex.

With some clever #ifdef, you can probably have a single or multithreaded build switch at compile time which makes all the mutex stuff do nothing.

By loeg 2025-11-1722:24

Tecnhically the mutex refcounting example is shown as an example of the before the header the author is talking about. We don't know what they've chosen to implement shared_ptr with.

By up2isomorphism 2025-11-187:05

it is quite obvious which one is easier: type bunch of ifdefs vs learn a new language.

BTW don’t fight C for portability, it is unlikely you will win.

By cryptonector 2025-11-1722:53

Meh, it could easily use atomics instead, no lock needed.

By cachius 2025-11-1712:563 reply

A recent superpower was added by Fil aka the pizlonator who made C more Fil-C with FUGC, a garbage collector with minimal adjustments to existing code, turning it into a memory safe implementation of the C and C++ programming languages you already know and love.

https://news.ycombinator.com/item?id=45133938

https://fil-c.org/

By mk89 2025-11-1715:24

Thank you so much for sharing this. I missed the HN post.

This is beautiful!

By 762236 2025-11-1713:304 reply

Why would I want to run a garbage collector and deal with it's performance penalties?

By jerf 2025-11-1713:596 reply

Because about 99% of the time the garbage collect is a negligible portion of your runtime at the benefit of a huge dollop of safety.

People really need to stop acting like a garbage collector is some sort of cosmic horror that automatically takes you back to 1980s performance or something. The cases where they are unsuitable are a minority, and a rather small one at that. If you happen to live in that minority, great, but it'd be helpful if those of you in that minority would speak as if you are in the small minority and not propagate the crazy idea that garbage collection comes with massive "performance penalties" unconditionally. They come with conditions, and rather tight conditions nowadays.

By hypeatei 2025-11-1714:102 reply

I think these threads attract people that write code for performance-critical use cases which explains the "cosmic horror" over pretty benign things. I agree though: most programs aren't going to be brought to their knees over some GC sweeps every so often.

By KerrAvon 2025-11-1715:583 reply

Outside of hobbyist things, performance-critical code is the only responsible use case for a non-memory safe language like C in 2025, so of course it does. (Even that window is rapidly closing, though; languages like Rust and Swift can be better than C for perf-critical things because of the immutability guarantees.)

By jstimpfle 2025-11-1721:001 reply

Productivity, portability, stability, mind-share, direct access to OS APIs... there's a lot of reasons to still use C.

By pjmlp 2025-11-1721:331 reply

Only if the OS is written in C, and has its APIs exposed as C APIs to userspace.

Quite a few OSes don't fit that rule.

By mbac32768 2025-11-182:532 reply

Could you name two of these that are important to you?

By pjmlp 2025-11-186:02

Android, userspace is Java, and what is exposed on the NDK is a tiny portion, as it is only meant for games and implementing native methods for better performance beyond what JIT/AOT do, or bindings to existing libraries.

About 80% of the OS APIs are behind JNI calls, when using the NDK.

iOS, iPadOS, watchOS, the large majority of userspace APIs is based on Objective-C, or Swift, bare bones C is only available for the POSIX leftovers.

You need to call the Objective-C runtime APIs for anything useful as an app that Apple would approve.

For the Plan 9 geeks, Inferno, OS APIs are exposed via Limbo.

For folks that still find mainframes and micros cool, IBM i, IBM z/OS, Unisys ClearPath MCP, Unisys OS 2200.

For retrogaming folks, most 8 and 16 bit home computers.

By cryptonector 2025-11-183:40

Or even one. I know there are operating systems in use that are not written in C, but the major ones are written in C. And anyways, it's not just the OS. There's a pile of C code. Fil-C is a fantastic idea. I think Fil is going to make it good enough to use in production, and I badly want to use it in production.

By sramsay 2025-11-1719:011 reply

I keep hearing this, but I fail to see why "the massive, well-maintained set of critical libraries upon which UNIX is based" is not a good reason to use C in 2025.

I have never seen a language with a better ffi into C than C.

By lmm 2025-11-181:37

> the massive, well-maintained set of critical libraries upon which UNIX is based

What massive, maintained set is that? Base Unix is tiny, and any serious programming ecosystem has good alternatives for all of it.

By lelanthran 2025-11-1718:452 reply

> Outside of hobbyist things, performance-critical code is the only responsible use case for a non-memory safe language like C in 2025, so of course it does.

Maybe; I sometimes write non-hobbyist non-performance-critical code in C.

I'm actually planning a new product for 2026 that might be done in C (the current iteration of that product line is in Go, the previous iteration was in Python).

I've few qualms about writing the server in C.

By josephg 2025-11-1723:101 reply

> I've few qualms about writing the server in C.

Bad Unicode support. Lack of cross platform system libraries. Needing to deal with CMake / autotools / whatever. Poor error handling. No built in string, list or map types. No generics. Nullability. No sum types. No option, tuples or multi returns. Generally worse IDE support than a lot of languages. No good 3rd party package ecosystem. The modern idiocy of header files. Memory bugs. Debugging memory corruption bugs. …

I mean, yeah other than all those problems, C is a great little language.

By lelanthran 2025-11-186:001 reply

> Bad Unicode support. Lack of cross platform system libraries. Needing to deal with CMake / autotools / whatever. Poor error handling. No built in string, list or map types. No generics. Nullability. No sum types. No option, tuples or multi returns. Generally worse IDE support than a lot of languages. No good 3rd party package ecosystem. The modern idiocy of header files. Memory bugs. Debugging memory corruption bugs. …

You make some good, if oft-repeated, points; but for my product:

1. Bad Unicode support - I'm not sure what I will use this for; glyphs won't be handled by a server program and storage/search of UTF8/codepoints will be handled by the data store (PostgreSQL, if you must know).

2. CMake/autotools/etc - low list of 3rd party dependencies, so a plain Makefile works.

3. Worse IDE support than a lot of languages - not sure what you mean by this. C has LSP support, like every other language. I haven't noticed C support in editors to be worse than other languages.

4. No 3rd party package ecosystem - That's fine, I'm not pulling in many 3rd party packages, so those that are pulled in can be handled with the Makefile and manual updates.

5. The modern idiocy of header files - this confuses me; there is still no good alternative to header files to support exporting to a common ABI. Functions, written in C, will be callable from any other language because header files are automatically handled by swig for FFI.[1]

6. Memory bugs + debugging them - thankfully, using valgrind, then sanitisers in my build/test step makes this a very low priority for me. Not that bugs don't slip through, but single-exit error handling using goto's and cleanups make these kinds of bugs rare. Not impossible, but rare. Having the test steps include valgrind, then various sanitisers reduces the odds even more.

For the rest, yeah, nice to have "No built in string, list or map types. No generics. Nullability. No sum types. No option, tuples or multi returns. ", but those are optional to getting a product out. If C had them I'd use them, but I'm not exactly helpless without them.

The downside of writing a product in C, in 2025, isn't in your list above.

========================================

[1] One of my two main reasons for switching to C is because the product was so useful to paying clients that they'd like more functionality, which includes "use their language of choice to interact with the product.". Thus far I've hacked in solutions depending on which client wanted what, but there's limits to the hacked-in solutions.

IOW, "easily extendable by clients using their language of choice" is a hard product requirement. If it wasn't a hard requirement they can continue using the existing product.

By josephg 2025-11-187:351 reply

> You make some good, if oft-repeated, points

They're oft repeated because they're real problems.

> a plain Makefile works.

> C has LSP support, like every other language. I haven't noticed C support in editors to be worse than other languages.

Makefiles aren't supported well by clion or visual studio. LSP requires a compile-commands list to be able to work - which is a pita to export from makefiles. XCode and visual studio both require their own build systems. Etc etc. Its a mess.

Even if you set up LSP properly, debugging can still be a PITA. Most of the time, it doesn't "just work" like in many other languages.

In comparison, all Go projects look the same and all tooling understands them. Same for C#, Rust, Typescript, Zig, and many others.

> 5. The modern idiocy of header files - this confuses me; there is still no good alternative to header files to support exporting to a common ABI.

Other languages don't need header files, and yet they manage to export public interfaces just fine. Header files only exist because computers had tiny amounts of RAM in the 70s, and they couldn't keep everything in memory while compiling. The fact we keep them around in 2025 boggles my mind.

Header files create 2 problems:

1. You have to write them and keep them up to date as your function signatures change, which is pure overhead.

2. They slow down compilation, because the compiler has to re-parse your headers for every codegen unit. Yes, PCH exists - but its platform specific and complicated to set up yourself. You can use unity builds instead, but thats fiddly and it can cause other headaches.

> The downside of writing a product in C, in 2025, isn't in your list above.

What would you say the downsides of writing a product in C in 2025 are?

> One of my two main reasons for switching to C is because the product was so useful to paying clients that they'd like more functionality, which includes "use their language of choice to interact with the product."

Yeah; I agree that this is one area where C shines. I really wish we had better ways to do FFI than C ABI compatibility everywhere. Rust, Swift, Zig, C++ and others can of course all compile to static libraries that look indistinguishable from C object files. But if you're using those languages, writing a C API is another step. If you're already working in C, I agree that its much easier to write these APIs and much easier to keep them up to date as your code changes.

By lelanthran 2025-11-188:171 reply

> Makefiles aren't supported well by clion or visual studio.

I dunno what IDE support I might need - once the Makefile is written I'm not going to be constantly adding and removing packages on a frequent basis.

As for IDE's, I am not using Clion, Visual Studio or XCode. Vim, Emacs and VSCode work fine with C projects, even when debugging interactively.

> What would you say the downsides of writing a product in C in 2025 are?

Slower initial development compared to HLL like Go, Python, etc. Well, it's slow if you want to avoid the major classes of bugs, anyway. You can go fast in C, but:

a) It's still not going to be as high-initial-velocity as (for example) Python, Java or C#

and

b) You're probably going to have a lot more bugs.

My standard approach to avoiding many of the pitfalls in using C (pitfalls which are also applicable to C++) is to use a convention that makes it easier to avoid most logic bugs (which, in the process avoids most memory bugs too). This convention does require a little more upfront design, but a lot more code.

So, yeah, I'll go slightly slower; this is not a significant enough factor to make anyone consider switching languages.

> Other languages don't need header files, and yet they manage to export public interfaces just fine.

Only for the isolated ecosystem of that language. C header files can be used to automatically perform the FFI for every single mainstream language.

So, sure, other languages can an export an interface to a file, but that interface probably can't be used for FFI, and in the rare cases where it can, it can't be automatically used.

C headers can, and are, used to automatically generated bindings for other languages.

By lmm 2025-11-181:381 reply

> I've few qualms about writing the server in C.

Why are you not worried about becoming the next Cloudbleed? Do you believe you have superhuman programming abilities?

By lelanthran 2025-11-185:09

> Why are you not worried about becoming the next Cloudbleed?

The odds are just too low.

> Do you believe you have superhuman programming abilities?

I do not believe I have superhuman abilities.

By jvanderbot 2025-11-1718:51

I think these threads attract people like that, but also people that want to be like that. I've seen a lot of people do "rigor theater", where things like reproduce-able builds, garbage collection, or, frankly, memory safety are just thought terminating cliches.

By Phil_Latio 2025-11-1714:173 reply

> Because about 99% of the time the garbage collect is a negligible portion of your runtime

In a system programming language?

By jerf 2025-11-1716:471 reply

Whether or not GC is a negligible portion of your runtime is a characteristic of your program, not your implementation language. For 99% of programs, probably more, yes.

I have been working in GC languages for the last 25 years. The GC has been a performance problem for me... once. The modal experience for developers is probably zero. Once or twice is not that uncommon. But you shouldn't bend your entire implementation stack choice over "once or twice a career" outcomes.

This is not the only experience for developers, and there are those whose careers are concentrated in the places where it matters... databases, 100%-utilization network code, hardware drivers. But for 99% of the programs out there, whatever language they are implemented in, GC is not an important performance consideration. For the vast bulk of those programs, there is a much larger performance consideration in it that could be turned up in 5 minutes with a profiler and nobody has even bothered to do that and squeeze out the accidentally quadratic code because even that doesn't matter to them, let alone GC delays.

This is the "system programmer's" equivalent of the web dev's "I need a web framework that can push 2,000,000 requests per second" and then choosing the framework that can push 2,001,000 rps over the one that can push 2,000,000 because fast... when the code they are actually writing for the work they are actually doing can barely push 100 rps. Even game engines nowadays have rather quite a lot of GC in them. Even in a system programming language, and even in a program that is going to experience a great deal of load, you are going to have to budget some non-trivial optimization time to your own code before GC is your biggest problem, because the odds that you wrote something slower than the GC without realizing it is pretty high.

By Phil_Latio 2025-11-1717:59

> Whether or not GC is a negligible portion of your runtime is a characteristic of your program, not your implementation language.

Of course, but how many developers choose C _because_ it does not have a GC vs developers who choose C# but then work around it with manual memory management and unsafe pointers? ....... It's > 1000 to 1

There are even new languages like C3, Odin, Zig or Jai that have a No-GC-mindset in the design. So why you people insist that deliberately unsafe languages suddenly need a GC? There a other new languages WITH a GC in mind. Like Go. Or pick Rust - no GC but still memory safe. So what's the problem again? Just pick the language you think fits best for a project.

By Snarwin 2025-11-1714:351 reply

There's plenty of application-level C and C++ code out there that isn't performance-critical, and would benefit from the safety a garbage collector provides.

By jvanderbot 2025-11-1718:52

Right, does `sudo` net benefit from removal of heap corruption, out of bounds, or use after free, etc errors that GC + a few other "safeties" might provide? I think so!

By pjmlp 2025-11-1714:302 reply

Yes, plenty have been done already so since Lisp Machines, Smalltalk, Interlisp-D, Cedar, Oberon, Sing#, Modula-2+, Modula-3, D, Swift,....

It is a matter to have an open mindset.

Eventually system languages with manual memory management will be done history in agentic driven OSes.

By HumanOstrich 2025-11-181:321 reply

"Agentic driven OSes"? Sounds like AI hype babble.

By pjmlp 2025-11-188:041 reply

Hype or not, that is what is being pushed

https://www.windowscentral.com/microsoft/windows-11/microsof...

https://www.klover.ai/apple-uses-ai-agents-10-ways-to-use-ai...

By HumanOstrich 2025-11-188:29

What does this have to do with system languages with manual memory management going away?

By KerrAvon 2025-11-1716:012 reply

Swift, by design, does not have GC.

By pjmlp 2025-11-1716:36

Chapter 5,

https://gchandbook.org/contents.html

It would help if all naysayers had their CS skills up to date.

By pebal 2025-11-1716:211 reply

RC is a GC method and the least efficient one.

By winrid 2025-11-1720:581 reply

It's the most predictable and has much less overhead than a moving collector.

By pjmlp 2025-11-1721:25

Only when we forget about the impact of cycle collections, or domino effects stoping the world when there is a cascade of counters reaching zero.

The optimisatios needed to improve such scenarions, are akin to a poor man's tracing GC implementation.

By jesse__ 2025-11-1722:491 reply

> Because about 99% of the time the garbage collect is a negligible portion of your runtime

lol .. reality disagrees with you.

https://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf#:~:te...

On page 3 they broadly conclude that if you use FIVE TIMES as much memory as your program would if managed manually, you get a 9% performance hit. If you only use DOUBLE, you get as much as a 70% hit.

Further on, there are comprehensive details on the tradeoffs between style of GC vs memory consumption vs performance.

---

Moving a value from DRAM into a CPU register is an expensive operation, both in terms of latency, and power consumption. Much of the code out in the "real world" is now written in garbage collected languages. Our datacenters are extremely power hungry (as much as 2% of total power in the US is consumed by datacenters), and becoming more so every day. The conclusion here is that garbage collection is fucking expensive, in real-world terms, and we need to stop perpetuating the idea that it's not.

By atherton94027 2025-11-181:261 reply

Methodology seems kind of dubious:

> We introduce a novel experimental methodology that lets us quan- tify the performance of precise garbage collection versus explicit memory management. Our system allows us to treat unaltered Java programs as if they used explicit memory management by relying on oracles to insert calls to free. These oracles are generated from profile information gathered in earlier application runs.

By jesse__ 2025-11-182:391 reply

What specifically seems dubious about that? I thought it was quite a clever idea.

By atherton94027 2025-11-183:48

If you dig into the paper, on page 3 they find out that their null oracle approach (ie without actually freeing the memory) increases run times erratically by 12 to 33%. They then mention that their simulated approach should handle that case but it seems unlikely to me that their stats aren't affected. Also they disable multi-threading – again for repeatability – but that will obviously have a performance impact.

By 762236 2025-11-1714:131 reply

For new projects, I just use Rust: there is zero reason to deal with a garbage collector today. If I'm in C, it's because I care about predictable performance, and why I'm not using Java for that particular project.

By greenavocado 2025-11-1715:082 reply

https://docs.rs/gc/latest/gc/

By 762236 2025-11-1715:56

I don't understand. This is an optional part of Rust. I'm obviously not using a garbage collector in Rust.

By bigstrat2003 2025-11-1717:20

Not really sure what relevance a third party library has when discussing the language characteristics.

By mbac32768 2025-11-182:48

The Java stop-the-world garbage collector circa the late 90s/early 2000s traumatized so many people on automated garbage collection.

By sesm 2025-11-1714:425 reply

IDK about Fil-C, but in Java garbage collector actually speeds up memory management compared to C++ if you measure the throughput. The cost of this is increased worst-case latency.

A CLI tool (which most POSIX tools are) would pick throughput over latency any time.

By CyberDildonics 2025-11-1715:58

I see this claim all the time without evidence, but it's also apples and oranges. In C++ you can avoid heap allocations so they are rare and large. In java you end up with non stop small heap allocations which is exactly what you try to avoid when you want a program to be fast.

Basically java gc is a solution to a problem that shouldn't exist.

By dataflow 2025-11-1715:121 reply

> in Java garbage collector actually speeds up memory management compared to C++ if you measure the throughput

If I had a dollar for every time somebody repeated this without real-world benchmarks to back it up...

By jesse__ 2025-11-1723:03

I wish I could upvote this 100 times

By wavemode 2025-11-1716:101 reply

Java (w/ the JIT warmed up) could possibly be faster than C++, if the C++ program were to allocate every single value on the heap.

But you're never going to encounter a C++ program that does that, since it makes no sense.

By jesse__ 2025-11-1723:03

Entertaining anecdote:

I once worked on a python program that was transpiled to C++, and literally every variable was heap allocated (because that's what python does). It was still on the order of 200x faster than python IIRC.

By zozbot234 2025-11-1715:061 reply

You also pay for the increased throughput with significant memory overhead, in addition to worst-case latency.

By KerrAvon 2025-11-1715:54

This. The memory overhead kills you in large systems/OS-level GC. Reducing the working set size really matters in a complex system to keep things performant, and GC vastly expands the working set.

In the best cases, you’re losing a huge amount of performance vs. an equivalent non-GC system. In the worst, it affects interactive UI performance with multi-second stalls (a suitably modern GC shouldn’t do this, though).

By milch 2025-11-1721:42

Depending on the CLI tool you could even forego memory management completely and just rely on the OS to clean up. If your program completely reads arbitrary files into memory it's probably not the best idea, but otherwise it can be a valid option. This is likely at least partly what happens when you run a benchmark like this - the C++ one cleans everything up nicely if you use smart pointers or manual memory management, while the Java tool doesn't even get to run GC at all, or if it does it only cleans up a percentage of the objects instead of all of them.

By cryptonector 2025-11-183:38

Because C is very unsafe, but there are still many billions of lines of C in use, so making C safer is a great idea.

By palata 2025-11-1713:401 reply

Easy: because in your specific use-case, it's worth trading some performance for the added safety.

By 762236 2025-11-1714:111 reply

If I'm in C, I'm using JNI to work around the garbage collector of Kava

By palata 2025-11-1714:39

Have you ever measured the performance impact of JNI? :-)

By purplesyringa 2025-11-1715:552 reply

This feels like a misrepresentation of features that actually matter for memory safety. Automatically freeing locals and bounds checking is unquestionably good, but it's only the very beginning.

The real problems start when you need to manage memory lifetimes across the whole program, not locally. Can you return `UniquePtr` from a function? Can you store a copy of `SharedPtr` somewhere without accidentally forgetting to increment the refcount? Who is responsible for managing the lifetimes of elements in intrusive linked lists? How do you know whether a method consumes a pointer argument or stores a copy to it somewhere?

I appreciate trying to write safer software, but we've always told people `#define xfree(p) do { free(p); p = NULL; } while (0)` is a bad pattern, and this post really feels like more of the same thing.

By cryptonector 2025-11-183:53

> Can you return `UniquePtr` from a function?

Yes: you can return structures by value in C (and also pass them by value).

> Can you store a copy of `SharedPtr` somewhere without accidentally forgetting to increment the refcount?

No, this you can't do.

By teo_zero 2025-11-187:02

> we've always told people `#define xfree(p) do { free(p); p = NULL; } while (0)` is a bad pattern

Have we? Why?

Giving C a superpower: custom header file (safe_c.h)

Show article

Introduction

safe_c.h

The Memory Management Beast: Slain with Smart Pointers (C++ feature)

The Buffer Overflow Beast: Contained with Vectors and Views (C++ feature)

Views: Look, Don't Touch (or malloc) - C++ feature

The Error-Handling goto Beast: Replaced with Results (Rust feature) and RAII (C++ feature)

The Assumption Beast: Challenged with Contracts and Safe Strings

strcpy() is a Security Vulnerability

Concurrency: Mutexes That Unlock Themselves (Rust feature)

Performance: Safety at -O2, Not -O0

Conclusion

mithcs

Comments

By woodruffw 2025-11-1713:3013 reply

By kouteiheika 2025-11-1713:422 reply

By woodruffw 2025-11-1713:451 reply

By kouteiheika 2025-11-1713:511 reply

By woodruffw 2025-11-1713:593 reply

By kouteiheika 2025-11-1714:161 reply

By woodruffw 2025-11-1714:28

By SR2Z 2025-11-1719:411 reply

By SkiFire13 2025-11-1719:461 reply

By CyberDildonics 2025-11-1722:072 reply

By woodruffw 2025-11-1722:20

By oconnor663 2025-11-1723:17

By groundzeros2015 2025-11-182:031 reply

By woodruffw 2025-11-183:131 reply

By groundzeros2015 2025-11-183:59

By cogman10 2025-11-1713:472 reply

By kouteiheika 2025-11-1714:391 reply

By colonwqbang 2025-11-1715:593 reply

By addaon 2025-11-1717:361 reply

By lpribis 2025-11-1722:31

By cogman10 2025-11-1716:182 reply

By SkiFire13 2025-11-1719:51

By colonwqbang 2025-11-1717:131 reply

By cogman10 2025-11-1718:051 reply

By colonwqbang 2025-11-1912:29

By oconnor663 2025-11-1723:22

By goalieca 2025-11-1715:081 reply

By cogman10 2025-11-1716:031 reply

By unnah 2025-11-1717:02

By accelbred 2025-11-1716:493 reply

By TuxSH 2025-11-1718:421 reply

By wyldfire 2025-11-1719:461 reply

By TuxSH 2025-11-1911:02

By woodruffw 2025-11-1717:08

By eddd-ddde 2025-11-1719:111 reply

By accelbred 2025-11-1722:56

By spacedcowboy 2025-11-1713:463 reply

By woodruffw 2025-11-1714:02

By lelanthran 2025-11-1716:18

By lmm 2025-11-181:331 reply

By spacedcowboy 2025-11-199:12

By lelanthran 2025-11-1716:14

By saurik 2025-11-1714:082 reply

By woodruffw 2025-11-1714:322 reply

By jhatemyjob 2025-11-1720:571 reply

By ruined 2025-11-182:221 reply

By wat10000 2025-11-1719:32

By aidenn0 2025-11-1716:431 reply

By layer8 2025-11-1717:32

By kev009 2025-11-1717:131 reply

By loeg 2025-11-1722:34

By nurettin 2025-11-193:54

By kazinator 2025-11-182:09

By loeg 2025-11-1722:24

By up2isomorphism 2025-11-187:05

By cryptonector 2025-11-1722:53

By cachius 2025-11-1712:563 reply

By mk89 2025-11-1715:24

By 762236 2025-11-1713:304 reply

By jerf 2025-11-1713:596 reply

By hypeatei 2025-11-1714:102 reply

By KerrAvon 2025-11-1715:583 reply

By jstimpfle 2025-11-1721:001 reply

By pjmlp 2025-11-1721:331 reply

By mbac32768 2025-11-182:532 reply

The Error-Handling `goto` Beast: Replaced with Results (Rust feature) and RAII (C++ feature)

`strcpy()` is a Security Vulnerability