Rust’s Incremental Compiler Architecture

2024-12-132:0311922lwn.net

The traditional structure of a compiler forms a pipeline — parsing, type-checking, optimizatio [...]

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

The traditional structure of a compiler forms a pipeline — parsing, type-checking, optimization, and code-generation, usually in that order. But modern programming languages have requirements that are ill-suited to such a design. Increasingly, compilers are moving toward other designs in order to support incremental compilation and low-latency responses for uses like integration into IDEs. Rust has, for the last eight years, been pursuing a particularly unusual design; in that time compile times have substantially improved, but there's still more work to be done.

In a traditional compiler, normally each step of the pipeline needs to complete before the next is started. For example, optimization must complete before the compiler can begin emitting machine code. There are ways to improve this — working on translation units in parallel, caching some kinds of analysis, etc. — but, at a certain point, the architectures of the compilers themselves make it difficult to make the code support incremental compilation.

Language design can influence how much of a problem this is in practice. For example, while the GCC developers did look into making the compiler incremental, the fact that C defines each file as a stand-alone translation unit makes it easier to add incrementalism at the level of the build system. Many modern languages don't observe the same separation between files, however. In Rust, an entire crate is a single translation unit; cargo, Rust's build system, handles only recompiling changed crates, but adding incrementalism within a crate requires special support from the compiler.

So, in response to wishes for faster compile times, Rust has been exploring an alternative to the traditional structure of a compiler by switching to a query-based model: instead of having a fixed pipeline, the compiler has a set of queries for different properties of the program. Each query can either retrieve the answer from a persistent compilation database, or, if the answer has not yet been cached, calculate it from scratch by making other queries and combining their results. For example, the compiler uses the type_of() query to infer the types of local bindings. The query takes the ID of a local definition, and then calls other queries to determine whether it is an impl Trait type, what the abstract syntax tree of the definition is, what the types of the variables it refers to are, and so on.

Suppose that the programmer changes the type of a constant in their crate. When they run the compiler, it will evaluate a top-level query to produce an output binary. That query will in turn query for the final versions of each function in the crate (after type-checking and optimization). Most of those will not have changed, so the answers are provided from the on-disk compilation database. The queries for functions that depend on the changed constant, however, will get re-run — and will, themselves, call other queries until the whole process ends up at the query corresponding to the parsed representation of the single changed definition. Only the minimum amount of analysis and compilation needed to accommodate that change is performed.

This approach is reminiscent of a build system — and that's deliberate. Build systems, while not always perfect, usually do a good job of tracking the dependencies between components and correctly caching partial results. A compiler that has the architecture of a build system can take advantage of the research that has been done on build systems to do the same.

Rust's approach

The Rust project began the process of rewriting its compiler to support incremental compilation in 2016. While that redesign is not yet fully complete, large portions of the compiler now use a query-based approach. LWN recently covered Sparrow Li's talk that discussed what remains to be done.

The Rust compiler organizes queries using a structure called TyCtxt. When the compiler starts up, the various modules all register functions called "providers" that implement each query. Then, depending on how the compiler was invoked, it parses the code (because the parser has not yet been integrated into the query system) and runs a set of top-level queries to compile an artifact, format the code, produce warnings, or whatever else was requested.

Providers need to be pure functions, since the compiler may not always need to run them, so the programmer can't depend on any side effects. This also has the effect of forcing those parts of the compiler that have been converted to the query system to avoid global state. The results of queries could in theory be any type, but in practice the compiler requires that they be values that can be cheaply copied, in order to simplify the ownership of cached values. Not every query is saved to disk, but most queries return types that can be serialized and hashed, so that they can be.

There are some important differences in structure between Rust's queries and static build systems like Make, not just in the form that their results take. In Rust's implementation, the relationship between queries is implicit. Queries are always invoked via a TyCtxt; that structure can track the dependencies between queries by seeing which queries are invoked while another is executing. Also, unlike some build systems, the compiler tracks the order in which dependencies are invoked. This is important because the results of one query can change which other queries get executed — therefore, the order impacts the calculation of which queries need to be rerun when doing incremental compilation. For example, this query depends on different things depending on the result of subquery1():

    fn example_query<'tcx>(tcx: TyCtxt<'tcx>) -> SomeType {
        if tcx.subquery1() {
            tcx.subquery2()
        } else {
            tcx.subquery3()
        }
    }

During the initial run of the compiler, when there are no saved results, the program collects the graph of which queries refer to each other. When the compiler is rerun, it checks that graph to see which queries need to be rerun because of changed inputs. As an additional optimization, if the inputs to a query have changed, but the result of the query has not, the compiler can detect that and cut off the process of recalculating queries early.

The process of figuring out whether an intermediate result has changed is complicated by the fact that any information about the incremental build needs to be shared between runs of the compiler by saving it to disk. The compiler uses a lot of internal ID numbers to organize different components that differ from run to run, and serializing every intermediate result in the compiler would be expensive. So actually saving the results of cached queries is more complex.

In order to work around the problem of changing internal IDs, structures that are saved to disk are given "stable" forms of each ID. For example, function and type definitions are identified by path (e.g. std::collections::HashMap) instead of by internal ID. In order to avoid the overhead of deserializing intermediate results to detect changes, the compiler calculates and stores hashes for each item. Hashes are much more compact, and don't require complex deserialization, so this is a substantial performance improvement. The downside is that computing so many hashes actually causes significant overhead. The compiler's internal documentation calls it "the main reason why incremental compilation can be slower than non-incremental compilation."

These performance improvements introduce their own complications, of course. For example, if a result is never loaded from disk, the compiler won't necessarily be able to include it in the incremental-compilation database for the next run of the program. So, after generating results, the compiler does another pass through the query graph to fetch and update the results that it skipped earlier, in order to make sure that the size of the cache doesn't unnecessarily shrink. This increases the compiler's overall runtime — but what users usually care about is the time it takes to produce warnings and errors, not how long it takes the compiler to do cleanup before exiting. There are various other optimizations and tweaks applied to the largest queries to make the overall system more performant. The Rust Compiler Development Guide's section on the query system has more details.

Overall, the query-based approach introduces a lot of complexity compared to a simple pipeline-based approach. But any large, performance-sensitive codebase accumulates complexity over time — and having a single, well-understood system for incremental compilation may well win out over gradually introducing ad-hoc improvements to a pipeline-based system. In any case, Rust's incremental compilation has been stable enough to become the default for development builds. Release builds still avoid incremental compilation by default, however, out of an abundance of caution. Whether the query-based compiler design will ultimately be adopted by other languages remains to be seen.



Read the original article

Comments

  • By sjrd 2024-12-137:00

    The Scala.js compiler toolchain has been using similar techniques since 2015 by default. It features parallel and incremental whole-program optimizations, so it is also used for release builds. In fact it's so fast we run most advanced optimizations in development builds as well.

    For those interested in more details, we published a paper about it in OOPSLA'16 [1].

    [1] https://dl.acm.org/doi/10.1145/2983990.2984013

  • By fulafel 2024-12-136:013 reply

    It seems that in https://perf.rust-lang.org/dashboard.html the improvements have essentially plateaued, I wonder how they decide how much work to invest there.

    (Although as the language complexity continues to grow maybe that question is missing the opposite force of compiler slowdowns from added work?)

    • By epage 2024-12-1312:54

      For some insight into what more can be done, see

      - https://internals.rust-lang.org/t/slow-incremental-compilati...

      - https://internals.rust-lang.org/t/feedback-request-performan... (there is more context to that being published today in a blog post)

      There are several more ideas that are in even earlier stages of discussion.

    • By lifthrasiir 2024-12-138:49

      The log-scale plot hasn't plateaued yet, so there are still rooms for improvement. (Remind that rustc was initially quite slow at the very beginning.)

    • By geewee 2024-12-136:252 reply

      I wonder if these graphs capture the new multi-threaded frontend that's on nightly.

      • By nindalf 2024-12-139:58

        These graphs only include stable releases, not nightly ones. Since the multi-threaded frontend hasn't landed on stable yet, we can't see it's effect on these graphs.

      • By n42 2024-12-138:112 reply

        genuine question: how much can the frontend really impact compile times at the end of the day? I would guess most of the time spent compiling is on the backend, but IANA compiler engineer

        • By nindalf 2024-12-1310:04

          > How much can the frontend really impact compile times?

          Your question is addressed in this blog post - Faster compilation with the parallel front-end in nightly (https://blog.rust-lang.org/2023/11/09/parallel-rustc.html). One the example shown there:

          - Serial took 10.2 seconds in the frontend and 6.2 seconds in the backend.

          - Parallel tool 5.9 seconds (-42%) and 5.3 seconds (-14.5%).

          So to answer your question, parallelising the frontend can have a substantial impact.

          You are right that frontend is only a subset of the total compilation time - backend and linking time matter too. Happily, that's something that's being worked on too!

          There is an alternate backend called cranelift (https://bjorn3.github.io/2024/11/14/progress-report-nov-2024...) that compiles faster than LLVM, at the cost of producing slower binaries. That's a good trade-off for debug builds.

          And there's work on changing the default linker to lld (https://blog.rust-lang.org/2024/05/17/enabling-rust-lld-on-l...).

          We may see all 3 efforts land in the next year :)

        • By saagarjha 2024-12-138:242 reply

          One obvious example of this would be C++, where a smarter frontend that doesn't do textual inclusion of a million lines would significantly improve compile times.

          • By menaerus 2024-12-139:051 reply

            But that's low-hanging fruit "optimization", no? Once you get around it, and this has been forever, the bottleneck is in the backend generation. So if Rust has already solved this, the area where they can improve most is the backend generation?

            Most C++ builds I have worked with, and all of them being MMLoC codebases, were actually bottlenecked by the high memory pressure. In part due to heavy templates and symbol visibility.

            • By fulafel 2024-12-1510:151 reply

              I think the history of C++ implementations shows that it's not low hanging fruit, it's a huge effort to implement and the payoffs aren't game changing.

              • By menaerus 2024-12-1511:29

                Modules? Yeah, I don't expect them to be a game changer either. Exactly because of the reasons I mentioned.

          • By n42 2024-12-138:26

            that makes sense. are Rust's macros handled in the frontend? if so, I could see how multithreading could have a dramatic impact

  • By rbalicki 2024-12-1312:56

    This [1] is a (somewhat outdated!) design doc for how we're approaching incremental compilation in the Isograph compiler! If anyone if interested in helping build an incremental compiler, the project is being actively developed and open to new contributors! See also my recent talk at GraphQL conf [2]

    In particular, we're actively implementing a simpler Salsa [3] and transform the currently-batch-mode compiler into an incremental, pull compiler.

    [1] https://isograph.dev/docs/design-docs/incremental-compilatio...

    [2] https://www.youtube.com/watch?v=sf8ac2NtwPY

    [3] https://www.youtube.com/watch?v=_muY4HjSqVw

HackerNews