The hidden compile-time cost of C++26 reflection

2026-03-0617:105940vittorioromeo.com

Article/tutorial on http://vittorioromeo.com

I am very excited about C++26 reflection.

I am also obsessed by having my code compile as quickly as possible. Fast compilation times are extremely valuable to keep iteration times low, productivity and motivation high, and to quickly see the impact of your changes. 1

With time and experience, I’ve realized that C++ can be an extremely fast-to-compile language. Language features like templates are not the issue – the Standard Library is.

My fork of SFML uses almost no Standard Library at all, and I can recompile the entire thing from scratch in ~4.3s. That’s around ~900 TUs, including external dependencies, tests, and examples.2 Incremental builds are, for all intents and purposes, instantaneous. I love it.

I would love to live in a world where C++26 reflection is purely a lightweight language feature, however that ship has sailed (thank you, Jonathan Müller, for trying).

In this article, I’ll try to provide some early expectations about the compile-time impact of C++26 reflection.

let’s measure!

I found a nice Docker image containing GCC 16, the first version that supports reflection, and got to work. To get reasonably stable measurements, I used the hyperfine command-line benchmarking tool.

These are my specs:

  • CPU: 13th Gen Intel Core i9-13900K
  • RAM: 32GB (2x16GB) DDR5-6400 CL32
  • OS: Debian 13 Slim (on Docker, sourcemation/gcc-16)
  • Compiler: GCC 16.0.1 20260227 (experimental)
  • Flags: -std=c++26 -freflection

My test scenarios were as follows:

  1. Baseline test. Just a int main() { }.

  2. Header inclusion test. Same as above, but with #include <meta>.

  3. Basic reflection over a struct’s fields:

    template <typename T> void reflect_struct(const T& obj) {
     template for (constexpr std::meta::info field :
     std::define_static_array(std::meta::nonstatic_data_members_of(
     ^^T, std::meta::access_context::current()))) {
     use(std::meta::identifier_of(field));
     use(obj.[:field:]);
     }
    }
    
    struct User {
     std::string_view name;
     int age;
     bool active;
    };
    
    int main() {
     reflect_struct(User{.name = "Alice", .age = 30, .active = true});
    }
  4. Barry Revzin’s AoS to SoA transformation example, from his blog post.3

I’ve also tested using precompiled headers (PCHs) for <meta> and other large dependencies.

⚠️ DISCLAIMER: please take these benchmark results with a grain of salt. ⚠️

  • My measurements are not that rigorous and that the compiler I used is still work-in-progress.
  • Also note that my specs are quite beefy – YMMV.
  • Finally, remember that these measurements are for a single translation unit – in a real project, you’d have to multiply the compile time overhead by the number of affected TUs.

benchmark results

# Scenario Code Precompiled Header (PCH) Compile Time (Mean)
1 Baseline (No Reflection Flag) int main() only None 43.9 ms
2 Baseline + -freflection int main() only None 43.1 ms
3 <meta> Header Inclusion int main() + #include <meta> None 310.4 ms
4 Basic Struct Reflection (1 type) reflect_struct with User None 331.2 ms
5 Basic Struct Reflection (10 types) reflect_struct with User<N> None 388.6 ms
6 Basic Struct Reflection (20 types) reflect_struct with User<N> None 410.9 ms
7 AoS to SoA (Original) Barry Revzin’s Unedited Code None 1,622.0 ms
8 AoS to SoA (No Print) Removed <print> None 540.1 ms
9 AoS to SoA (No Print/Ranges) Removed <print>, <ranges> None 391.4 ms
10 AoS to SoA (Original + PCH) Barry Revzin’s Unedited Code <meta>, <ranges>, <print> 1,265.0 ms
11 AoS to SoA (No Print + PCH) Removed <print> <meta>, <ranges> 229.7 ms
12 AoS to SoA (No Print/Ranges + PCH) Removed <print>, <ranges> <meta> 181.9 ms

A few clarifications:

  • The number of types in the “Basic Struct Reflection” example was adjusted by instantiating unique “clones” of the test struct type:

    template <int>
    struct User {
     std::string_view name;
     int age;
     bool active;
    };
    
    reflect_struct(User<0>{.name = "Alice", .age = 30, .active = true});
    reflect_struct(User<1>{.name = "Alice", .age = 30, .active = true});
    reflect_struct(User<2>{.name = "Alice", .age = 30, .active = true});
    // ...
  • “Removed <print>” implies removing all of the formatting/printing code from Barry’s example.

  • “Removed <ranges>” implies rewriting range-based code like std::views::transform or std::views::iota to good old boomer loops™.

insights

  1. The reflection feature flag itself is free.
    • Simply turning on -freflection adds 0 ms of overhead.
  2. Basic reflection costs can scale up quickly.
    • Base cost of reflecting 1 struct: 331.2 ms (but ~310 ms of this is just including <meta>).
    • Cost to reflect 9 extra types: +57.4 ms (~6.3 ms per type).
    • Cost to reflect 10 more types (20 total): +22.3 ms (~2.2 ms per type).
    • While this seems cheap on the surface, remember that my example is extremely basic, and that a large project can have hundreds (if not thousands) of types that will be reflected.
  3. Standard Library Headers are a big bottleneck.
    • The massive compile times in modern C++ don’t come from your metaprogramming logic, but from parsing standard library headers.
    • Pulling in <meta> adds ~149 ms of pure parsing time.
    • Pulling in <ranges> adds ~440 ms.
    • Pulling in <print> adds an astronomical ~1,082 ms.
  4. Precompiled Headers (PCH) are mandatory for scaling.
    • Caching <meta> and avoiding heavy dependencies cuts compile time down to 181.9 ms (scenario 12).
    • Caching <meta> + <ranges> drops the time from 540ms to 229ms (scenario 8 vs 11).
    • Interestingly, while caching <print> helps a bit, it still leaves the compile time uncomfortably high.
    • Perhaps modules could eventually help here, but I have still not been able to use them in practice successfully.
      • Notably, <meta> is not part of import std yet, and even with import std Barry’s example took a whopping 1.346s to compile.

what about modules?

I ran some more measurements using import std; with a properly-built std module that includes reflection.

Firstly, I created the module via:

g++ -std=c++26 -fmodules -freflection -fsearch-include-path -fmodule-only -c bits/std.cc

And then benchmarked with:

hyperfine "g++ -std=c++26 -fmodules -freflection ./main.cpp"

No #include was used – only import std.

These are the results (mean compilation time):

  • Basic struct reflection (1 type): ~352.8 ms
  • Barry’s AoS to SoA example: ~1.077 s

Compare that with PCH:

  • Basic struct reflection (1 type): ~208.7 ms
  • Barry’s AoS to SoA example: ~1.261 s

So… PCH actually wins compared to modules for just <meta>, and modules are not that much better than PCH for the larger example. Quite disappointing.

conclusion

Reflection is going to bring a lot of power to C++26. New libraries that heavily rely on reflection are going to become widespread. Every single TU including one of those libraries will virally include <meta> and any other used dependencies.

Assuming the usage of <meta> + <ranges> becomes widespread, we’re looking at a bare minimum of ~540ms compilation overhead per TU. Using PCHs (or modules) will become pretty much mandatory, especially in large projects.

I really, really wish that Jonathan Müller’s paper (P3429: <meta> should minimize standard library dependencies) was given more thought and support.

I also really wish that a game-changing feature such as reflection wasn’t so closely tied to the Standard Library. The less often I use the Standard Library, the more enjoyable and productive I find C++ as a language – insanely fast compilation times are a large part of that.

Hopefully, as reflection implementations are relatively new, things will only get better from here.


Read the original article

Comments

  • By ralferoo 2026-03-1010:593 reply

    "I can recompile the entire thing from scratch in ~4.3s. That’s around ~900 TUs, including external dependencies, tests, and examples"

    In 30 years of using C++ this is the first time I've ever come across "translation unit" being abbreviated to TU and it took a bit of effort to figure out what the author was trying to say. Not sure why they felt the need to abbreviate this when they explain PCH for instance, which is a far more commonly used term.

    Thought I'd add the context here to help anyone else out.

    • By pjmlp 2026-03-1011:08

      It is quite common when being around WG21 stuff, just as info.

    • By dataflow 2026-03-1014:51

      > Not sure why they felt the need to abbreviate this

      It's super common terminology for people around those spaces; they probably didn't even think about whether they should abbreviate it.

    • By SuperV1234 2026-03-1018:00

      I've updated the article to say "translation unit" the first time "TU" is introduced. The data was also incorrect due to an oversight on my part, and it's now much more accurate and ~50% faster across the board.

  • By SuperV1234 2026-03-1017:59

    Update & Apology:

    I've fully updated the article with new benchmarks.

    A reader pointed out that the GCC 16 Docker container I originally used was built with internal compiler assertions enabled, skewing the data and unfairly penalizing GCC.

    I've re-measured everything on a proper release build (Fedora 44), and the compile times are ~50% faster across the board.

    The article now reflects the accurate numbers, and I've added an appendix showing the exact cost of the debug assertions.

    I sincerely apologize for the oversight.

  • By leni536 2026-03-1010:073 reply

    libstdc++'s <print> is very heavy, reflection or not. AFAIK there is no inherent reason for it to be that heavy, fmtlib compiles faster.

    <meta> is another question, it depends on string_view, vector, and possibly other parts. Maybe it's possible to make it leaner with more selective internal deps.

    • By craftit 2026-03-1010:39

      I don't know the exact details, but I have heard (on C++ Weekly, I believe) that it offers some advantages when linking code compiled with different compiler versions. That said, I normally avoid it and use fmtlib to avoid the extra compile time. So it isn't clear if it is a win to me. Header-only libraries are great on small projects, but on large codebases with 1000's of files, it really hits you.

    • By surajrmal 2026-03-1013:36

      It also bloats binary size if you statically link libc++ because of localization, regardless if you care for it. This wasn't true for fmtlib because it doesn't support localization. stringstream has this same problem, but it's one of many reasons embedded has stuck with printf.

    • By SuperV1234 2026-03-1018:01

      The data was incorrect due to an oversight on my part (the Docker image I had used compiled GCC with internal assertions enabled).

      Including <print> is still very heavy, but not as bad as before (from ~840ms to ~508ms)

HackerNews