Compiling Ruby to machine language

2025-11-1720:0428752patshaughnessy.net

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email…

I've started working on a new edition of Ruby Under a Microscope that covers Ruby 3.x. I'm working on this in my spare time, so it will take a while. Leave a comment or drop me a line and I'll email you when it's finished.

Here’s an excerpt from the completely new content for Chapter 4, about YJIT and ZJIT. I’m still finishing this up… so this content is fresh off the page! It’s been a lot of fun for me to learn about how JIT compilers work and to brush up on my Rust skills as well. And it’s very exciting to see all the impressive work the Ruby team at Shopify and other contributors have done to improve Ruby’s runtime performance.

Chapter 4: Compiling Ruby To Machine Language

Interpreting vs. Compiling Ruby Code4
Yet Another JIT (YJIT)6
Virtual Machines and Actual Machines6
Counting Method and Block Calls8
YJIT Blocks8
YJIT Branch Stubs10
Executing YJIT Blocks and Branches11
Deferred Compilation12
Regenerating a YJIT Branch12
YJIT Guards14
Adding Two Integers Using Machine Language15
Experiment 4-1: Which Code Does YJIT Optimize?18
How YJIT Recompiles Code22
Finding a Block Version22
Saving Multiple Block Versions24
ZJIT, Ruby’s Next Generation JIT26
Counting Method and Block Calls27
ZJIT Blocks29
Method Based JIT31
Rust Inside of Ruby33
Experiment 4-2: Reading ZJIT HIR and LIR 35
Summary37

Counting Method and Block Calls

To find hot spots, YJIT counts how many times your program calls each function or block. When this count reaches a certain threshold, YJIT stops your program and converts that section of code into machine language. Later Ruby will execute the machine language version instead of the original YARV instructions.

To keep track of these counts, YJIT saves an internal counter nearby the YARV instruction sequence for each function or block.


Figure 4-5: YJIT saves information adjacent to each set of YARV instructions

Figure 4-5 shows the YARV instruction sequence the main Ruby compiler created for the sum += i block at (3) in Listing 4-1. At the top, above the YARV instructions, Figure 4-5 shows two YJIT related values: jit_entry and jit_entry_calls. As we’ll see in a moment, jit_entry starts as a null value but will later hold a pointer to the machine language instructions YJIT produces for this Ruby block. Below jit_entry, Figure 4-5 also shows jit_entry_calls, YJIT’s internal counter.

Each time the program in Listing 4-1 calls this block, YJIT increments the value of jit_entry_calls. Since the range at (1) in Listing 4-1 spans from 1 through 40, this counter will start at zero and increase by 1 each time Range#each calls the block at (3).

When the jit_entry_calls reaches a particular threshold, YJIT will compile the YARV instructions into machine language. By default for small Ruby programs YJIT in Ruby 3.5 uses a threshold of 30. Larger programs, like Ruby on Rails web applications, will use a larger threshold value of 120. (You can also change the threshold by passing —yjit-call-threshold when you run your Ruby program.)

YJIT Blocks

While compiling your Ruby program, YJIT saves the machine language instructions it creates into YJIT blocks. YJIT blocks, which are distinct from Ruby blocks, each contain a sequence of machine language instructions for a range of corresponding YARV instructions. By grouping YARV instructions and compiling each group into a YJIT block, YJIT can produce more optimized code that is tailored to your program’s behavior and avoid compiling code that your program doesn’t need.

As we’ll see next, a single YJIT block doesn’t correspond to a Ruby function or block. YJIT blocks instead represent smaller sections of code: individual YARV instructions or a small range of YARV instructions. Each Ruby function or block typically consists of several YJIT blocks.

Let’s see how this works for our example. After the program in Listing 4-1 executes the Ruby block at (3) 29 times, YJIT will increment the jit_entry_calls counter again, just before Ruby runs the block for the 30th time. Since jit_entry_calls reaches the threshold value of 30, YJIT triggers the compilation process.

YJIT compiles the first YARV instruction getlocal_WC_1 and saves machine language instructions that perform the same work as getlocal_WC_1 into a new YJIT block:

Figure 4-6: Creating a YJIT block

On the left side, Figure 4-6 shows the YARV instructions for the sum += i Ruby block. On the right, Figure 4-6 shows the new YJIT block corresponding to getlocal_WC_1.

Next, the YJIT compiler continues and compiles the second YARV instruction from the left side of Figure 4-7: getlocal_WC_0 at index 2.


Figure 4-7: Appending to a YJIT block

On the left side, Figure 4-7 shows the same YARV instructions for the sum += i Ruby block that we saw above in Figure 4-6. But now the two dotted arrows indicate that the YJIT block on the right contains the machine language instructions equivalent to both getlocal_WC_1 and getlocal_WC_0.

Let’s take a look inside this new block. YJIT compiles or translates the Ruby YARV instructions into machine language instructions. In this example, running on my Mac laptop, YJIT writes the following machine language instructions into this new block:


Figure 4-8: The contents of one YJIT block

Figure 4-8 shows a closer view of the new YJIT block that appeared on the right side of Figures 4-6 and 4-7. Inside the block, Figure 4-8 shows the assembly language acronyms corresponding to the ARM64 machine language instructions that YJIT generated for the two YARV instructions shown on the left. The YARV instructions on the left are: getlocal_WC_1, which loads a value from a local variable located in the previous stack frame and saves it on the YARV stack, and getlocal_WC_0, which loads a local variable from the current stack from and also saves it on the YARV stack. The machine language instructions on the right side of Figure 4-8 perform the same task, loading these values into registers on my M1 microprocessor: x1 and x9. If you’re curious and would like to learn more about what the machine language instructions mean and how they work, the section “Adding Two Integers Using Machine Language” discusses the instructions for this example in more detail.

YJIT Branch Stubs

Next, YJIT continues down the sequence of YARV instructions and compiles the opt_plus YARV instruction at index 4 in Figures 4-6 and 4-7. But this time, YJIT runs into a problem: It doesn’t know the type of the addition arguments. That is, will opt_plus add two integers? Or two strings, floating point numbers, or some other types?

Machine language is very specific. To add two 64-bit integers on an M1 microprocessor, YJIT could use the adds assembly language instruction. But adding two floating pointer numbers would require different instructions. And, of course, adding or concatenating two strings is an entirely different operation altogether.

In order for YJIT to know which machine language instructions to save into the YJIT block for opt_plus, YJIT needs to know exactly what type of values the Ruby program might ever add at (3) in Listing 4-1. You and I can tell by reading Listing 4-1 that the Ruby code is adding integers. We know right away that the sum += 1 block at (3) is always adding one integer to another. But YJIT doesn’t know this.

YJIT uses a clever trick to solve this problem. Instead of analyzing the entire program ahead of time to determine all of the possible types of values the opt_plus YARV instruction might ever need to add, YJIT simply waits until the block runs and observes which types the program actually passes in.

YJIT uses branch stubs to achieve this wait-and-see compile behavior, as shown in Figure 4-9.


Figure 4-9: A YJIT block, branch and stub

Figure 4-9 shows the YARV instructions on the left, and the YJIT block for indexes 0000-0002 on the right. But note the bottom right corner of Figure 4-7, which shows an arrow pointing down from the block to a box labeled stub. This arrow represents a YJIT branch. Since this new branch doesn’t point to a block yet, YJIT sets up the branch to point to a branch stub instead.


Read the original article

Comments

  • By jlarocco 2025-11-1722:054 reply

    IIRC MacRuby used to compile to native code on OSX using LLVM, and was supposed to support native OSX APIs and Objective-C frameworks. It always seemed like a neat idea, and a slick integration, but I guess Apple moved to Swift instead.

    I'll have to pick up a copy of this "Ruby Under a Microscope" book when the new version comes out. I've always liked Ruby, I just haven't had much chance to use it.

    • By vidarh 2025-11-187:44

      The creator of MacRuby left Apple, and created RubyMotion. It's continued by different people, but still around, though it seems the main focus of the people involved now is DragonRuby (a game-focused Ruby implementation)

    • By pjmlp 2025-11-188:24

      It lives on as RubyMotion after the author left Apple, http://www.rubymotion.com/

      https://en.wikipedia.org/wiki/RubyMotion

    • By hk1337 2025-11-180:332 reply

      AFAIK, you can still use Objective-C and create apps for macOS, iOS, and iPadOS? The APIs previously used may not be available anymore.

      • By jlarocco 2025-11-181:531 reply

        I'm sure you can still use Objective-C, but MacRuby stopped being updated around 2011, and I don't know how well it'd support newer versions of OSX.

        I dropped OSX long ago, so can't even try it out any more.

        I wonder how much of the LLVM bits could be reused? I'm sure LLVM's changed a bunch in the last 15 years, too.

        • By moltopoco 2025-11-183:391 reply

          My understanding is that MacRuby relied on Apple's ill-fated attempts to migrate from reference counting to regular garbage collection. I would be surprised if GC still worked on modern arm64 macOS. RubyMotion later adopted ARC but then it's not really Ruby anymore.

          • By jlarocco 2025-11-1814:46

            Gees, I forgot about their move to Arm. Almost certainly wouldn't work out of the box any more.

      • By jb1991 2025-11-185:12

        I think you misunderstood the comment. They were referring to Ruby and accessing the APIs.

    • By eek2121 2025-11-1722:576 reply

      Typical. I may get absolutely destroyed for this, but being professionally proficient in a ton of languages, including Ruby and the ones I mention below, and the ones I'm about to mention:

      This sounds like Microsoft when they moved from VB6 to VB.Net. At least they have a good thing going with C# though.

      VB6 was quite an interesting beast. You could do basically everything that you could do in languages like C/C++, but in most cases, you could churn out code quicker. This even extended to DirectX/Direct3D! For Web pages? ASP Classic.

      The tl;dr is that I really wish that ease of development were prioritized along with everything else. One of the reasons I like Ruby is the elegance of the language and ease of using it.

      Note that I've been using it since the mid 2000s or so, but not exclusively (both it and VB6 defined my career, however). C# is my second most favorite.

      If Ruby had the GUI design tools VB6 had, it would be interesting to look at the popularity stats

      Anyway, I'm rambling, so there is that. ;)

      • By pizza234 2025-11-180:091 reply

        VB6 deserves the huge popularity it had, but the reason wasn't because of the language design, rather, its (extremely) rapid GUI application development. It was actually a two-edged sword - it facilitated writing spaghetti code.

        > You could do basically everything that you could do in languages like C/C++

        As long as there is some form of memory access, any language can do basically everything that one can do in C/C++, but this doesn't make much sense.

        • By atherton94027 2025-11-180:183 reply

          > As long as there is some form of memory access, any language can do basically everything that one can do in C/C++, but this doesn't make much sense.

          No VB6 had really easy COM integration which let you tap into a lot of Windows system components. The same code in C++ often required hundreds of lines of scaffolding, and I'm not exaggerating

          • By jlarocco 2025-11-181:58

            FWIW, the pywin32 Python package and win32ole Ruby package have streamlined COM integration for Python and Ruby. Not quite as easy as VB6, but it's pretty close. I was even able to tab complete COM names in the Emacs Python REPL, but I remember it being a little buggy.

          • By bmm6o 2025-11-1811:591 reply

            It probably still sucks in C, but the C++ DX got a lot better. Importing the idl would generate wrapper functions that made calling code look much more like a normal function. It would check the hresult and return an out param from the function. They also introduced types like _variant_t that help boxing and unboxing native types. It still wasn't fun but it greatly reduced line count.

            • By pjmlp 2025-11-1812:42

              Nah, unless talking about C++ Builder extensions for COM, in Visual C++ land it still sucks big time.

              For some reason, there are vocal teams at Microsoft that resist anything in C++ that is comparable to VB, Delphi, .NET, C++ Builder ease of use regarding COM.

              Hence why we got MFC COM, ATL COM, WRL, WinRT (as COM evolution), C++/CX, C++/WinRT, WIL, and eventually all of them lose traction with that vocal group that aparently rather use COM with bare bones IDL files, using the command line and VI on Windows most likely.

          • By reactordev 2025-11-1813:331 reply

            Windows has a COM system. VB6 isn’t special. You can do that with VB.Net or C# too, C and C++. Windows COM is a thing. VB6 COM isn’t as VB6 only hooked into windows COM.

            • By atherton94027 2025-11-1814:36

              I'm just giving context as to why VB6 was much better than C++ back in the day for building windows apps. VB.Net and C# didn't exist in the halcyon days of 1998

      • By jlarocco 2025-11-181:50

        I don't think it was too similar, TBH. Apple never took MacRuby as seriously as Microsoft took VB6, and it hadn't even had a 1.0 release when the single developer left Apple to work on RubyMotion.

        I do agree it'd be interesting to have a GUI designer for Ruby. Does QML paired with QtRuby work?

        In the distant past I had a book about FXRuby, but never used it much, and don't think it had a UI designer - it was just bindings to Fox Toolkit, which is lightweight, but not as well maintained as Qt or Gtk.

      • By pjmlp 2025-11-188:27

        By .NET 2.0, VB.NET got most of the stuff back VB 6 folks complained about.

        Now what .NET never did as good as VB 6, was ease of COM development experience.

        Which given the role of COM in Windows APIs since Vista, is a major pain point as I don't get if COM is so relevant, why Microsoft teams keep rebooting, badly, the COM development experience.

      • By blacksmith_tb 2025-11-1723:511 reply

        What about something like Shoes[1]? I have played with it a little, just to make a simple UI to run some scripts I can run fine in a shell myself, but less-technical people may be too scared to fire up Terminal.app in order to do the same...

        1: http://shoesrb.com/

        • By pizza234 2025-11-180:11

          Shoes was very limited, and could only be used for extremely simple applications.

      • By pxc 2025-11-184:19

        > At least they have a good thing going with C# though.

        F# is pretty well-liked, too, isn't it?

      • By refulgentis 2025-11-182:16

        Typical? Of whom?

        You might get destroyed for this? Why?

        I don’t know what either of those mean in this context, and I used VB6 for a couple years at least and have been programming ObjC and / or Swift since 2006, with some time in Rails over a couple years.

        I’m extremely confused by your comment, it’s apparently near verboten in polite company, yet, manages to say nothing other than that while invoking several things of which I’m quite familiar.

        If you are destroyed, I anticipate it will be for a quarter baked, horrible, analogy between ObjC/Swift (or is it Ruby/Swift)? and VB6/VB.NET that somehow has something to do with Ruby.

  • By pasxizeis 2025-11-1721:011 reply

    Really happy to see Pat keeping it up! His first Ruby under a Microscope book but also his blog posts are amazing and a major source of inspiration for me. I did meet him personally in a Euruko conference. Such a great person.

    • By pat_shaughnessy 2025-11-1722:521 reply

      What a lovely comment - thank you!

      • By topato 2025-11-182:14

        whoa, the man himself! I second the praise, an all around excellent writer!

  • By chao- 2025-11-1721:371 reply

    I loved Ruby Under a Microscope when I first read it, and using that knowledge was able to have fun with some CTFs years ago.

    I haven't kept up with the evolving Ruby implementation internals, so I will sure as heck buy this new version of the book.

    • By UncleOxidant 2025-11-182:04

      I used Ruby a lot from about 2002 to 2010. Haven't used it much since then, but this article really makes me want to get a copy of the upcoming version Ruby Under a Microscope.

HackerNews