Hacker News

How many registers does an x86-64 CPU have? (2020)

2026-02-1413:3311595blog.yossarian.net

This post is at least a year old. x86 is back in the general programmer discourse, in part thanks to Apple’s M1 and Rosetta 2. As such, I figured I’d do yet another x86-64 post. Just like the last…

Show article

This post is at least a year old.

x86 is back in the general programmer discourse, in part thanks to Apple’s M1 and Rosetta 2. As such, I figured I’d do yet another x86-64 post.

Just like the last one, I’m going to cover a facet of the x86-64 ISA that sets it apart as unusually complex among modern ISAs: the number and diversity of registers available.

Like instruction counting, register counting on x86-64 is subject to debates over methodology. In particular, for this blog post, I’m going to lay the following ground rules:

I will count sub-registers (e.g., EAX for RAX) as distinct registers. My justification: they have different instruction encodings, and both Intel and AMD optimize/pessimize particular sub-register use patterns in their microcode.
I will count registers that are present on x86-64 CPUs, but that can’t be used in long mode.
I won’t count registers that are only present on older x86 CPUs, like the 80386 and 80486 test registers.
I won’t count microarchitectural implementation details, like shadow registers.
I will count registers that aren’t directly addressable, like MSRs that can only be accessed through RDMSR. However, I won’t (or will try not to) double-count registers that have multiple access mechanisms (like RDMSR and RDTSC).
I won’t count model-specific registers that fall into these categories:
- MSRs that are only present on niche x86 vendors (Cyrix, Via)
- MSRs that aren’t widely available on recent-ish x86-64 CPUs
  - Errata: I accidentally included AVX-512 in some of the original counts below, not realizing that it hadn’t been released on any AMD CPUs. The post has been updated.
- MSRs that are completely undocumented (both officially and unofficially)

In addition to the rules above, I’m going to use the following considerations and methodology for grouping registers together:

Many sources, both official and unofficial, use “model-specific register” as an umbrella term for any non-core or non-feature-set register supplied by an x86-64 CPU. Whenever possible, I’ll try to avoid this in favor of more specific categories.
Both Intel and AMD provide synonyms for registers (e.g. CR8 as the “task priority register,” or TPR). Whenever possible, I’ll try to use the more generic/category conforming name (like CR8 in the case above).
In general, the individual cores of a multicore processor have independent register states. Whenever this isn’t the case, I’ll make an effort to document it.

General-purpose registers

The general-purpose registers (or GPRs) are the primary registers in the x86-64 register model. As their name implies, they are the only registers that are general purpose: each has a set of conventional uses¹, but programmers are generally free to ignore those conventions and use them as they please².

Because x86-64 evolved from a 32-bit ISA which in turn evolved from a 16-bit ISA, each GPR has a set of subregisters that hold the lower 8, 16 and 32 bits of the full 64-bit register.

As a table:

64-bit	32-bit	16-bit	8-bit (low)
RAX	EAX	AX	AL
RBX	EBX	BX	BL
RCX	ECX	CX	CL
RDX	EDX	DX	DL
RSI	ESI	SI	SIL
RDI	EDI	DI	DIL
RBP	EBP	BP	BPL
RSP	ESP	SP	SPL
R8	R8D	R8W	R8B
R9	R9D	R9W	R9B
R10	R10D	R10W	R10B
R11	R11D	R11W	R11B
R12	R12D	R12W	R12B
R13	R13D	R13W	R13B
R14	R14D	R14W	R14B
R15	R15D	R15W	R15B

Some of the 16-bit subregisters are also special: the original 8086 allowed the high byte of AX, BX, CX, and DX to be accessed indepenently, so x86-64 preserves this for some encodings:

16-bit	8-bit (high)
AX	AH
BX	BH
CX	CH
DX	DH

So that’s 16 full-width GPRs, fanning out to another 52 subregisters.

Registers in this group: 68.

Running total: 68.

Special registers

This is sort of an artificial category: like every ISA, x86-64 has a few “special” registers that keep things moving along. In particular:

The instruction pointer, or RIP.

x86-64 has 32- and 16-bit variants of RIP (EIP and IP), but I’m not going to count them as separate registers: they have identical encodings and can’t be used in the same CPU mode³.
The status register, or RFLAGS.

Just like RIP, RFLAGS has 32- and 16-bit counterparts (EFLAGS and FLAGS). Unlike RIP, these counterparts can be partially mixed: PUSHF and PUSHFQ are both valid in long mode, and LAHF/SAHF can operate on the bits of FLAGS on some x86-64 CPUs outside of compatiblility mode⁴. So I’m going to go ahead and count them.

Registers in this group: 4.

Running total: 72.

Segment registers

x86-64 has a total of 6 segment registers: CS, SS, DS, ES, FS, and GS. The operation varies with the CPU’s mode:

In all modes except for long mode, each segment register holds a selector, which indexes into either the GDT or LDT. That yields a segment descriptor which, among other things, supplies the base address and extent of the segment.
In long mode all but FS and GS are treated as having a base address of zero and a 64-bit extent, effectively producing a flat address space. FS and GS are retained as special cases, but no longer use the segment descriptor tables: instead, they access base addresses that are stored in the FSBASE and GSBASE model-specific registers⁵. More on those later.

Registers in this group: 6.

Running total: 78.

SIMD and FP registers

The x86 family has gone through several generations of SIMD and floating-point instruction groups, each of which has introduced, extended, or re-contextualized various registers:

x87
MMX
SSE (SSE2, SSE3, SSE4, SSE4, …)
AVX (AVX2, AVX512)

Let’s do them in rough order.

x87

Originally a discrete coprocessor with its own instruction set and register file, the x87 instructions have been regularly baked into x86 cores themselves since the 80486.

Because of its coprocessor history, x87 defines both normal registers⁶ (akin to GPRs) and a variety of special registers needed to control the FPU state:

ST0 through ST7: 8 80-bit floating-point registers
FPSW, FPCW, FPTW ⁷: Control, status, and tag-word registers
“Data operand pointer”: I don’t know what this one does, but the Intel SDM specifies it⁸
Instruction pointer: the x87 state machine apparently holds its own copy of the current x87 instruction
Last instruction opcode: this is apparently distinct from the x87 opcode, and has its own register

Registers in this group: 14.

Running total: 92.

MMX

MMX was Intel’s first attempt at consumer SIMD in their x86 chips, released back in 1997.

For design reasons that are a complete mystery to me, the MMX registers are actually sub-registers of the x87 STn registers: each 64-bit MMn occupies the mantissa component of its corresponding STn. Consequently, x86 (and x86-64) CPUs cannot execute MMX and x87 instructions at the same time.

Edit: This section incorrectly included MXCSR, which was actually introduced with SSE. Thanks to /u/Skorezore for pointing out the error.

Registers in this group: 8.

Running total: 100.

SSE and AVX

For simplicity’s sake, I’m going to wrap SSE and AVX into a single section: they use the same sub-register pattern as the GPRs and x87/MMX do, so they fit well into a single table:

AVX-512 (512-bit)	AVX-2 (256-bit)	SSE (128-bit)
ZMM0	YMM0	XMM0
ZMM1	YMM1	XMM1
ZMM2	YMM2	XMM2
ZMM3	YMM3	XMM3
ZMM4	YMM4	XMM4
ZMM5	YMM5	XMM5
ZMM6	YMM6	XMM6
ZMM7	YMM7	XMM7
ZMM8	YMM8	XMM8
ZMM9	YMM9	XMM9
ZMM10	YMM10	XMM10
ZMM11	YMM11	XMM11
ZMM12	YMM12	XMM12
ZMM13	YMM13	XMM13
ZMM14	YMM14	XMM14
ZMM15	YMM15	XMM15
ZMM16	YMM16	XMM16
ZMM17	YMM17	XMM17
ZMM18	YMM18	XMM18
ZMM19	YMM19	XMM19
ZMM20	YMM20	XMM20
ZMM21	YMM21	XMM21
ZMM22	YMM22	XMM22
ZMM23	YMM23	XMM23
ZMM24	YMM24	XMM24
ZMM25	YMM25	XMM25
ZMM26	YMM26	XMM26
ZMM27	YMM27	XMM27
ZMM28	YMM28	XMM28
ZMM29	YMM29	XMM29
ZMM30	YMM30	XMM30
ZMM31	YMM31	XMM31

In other words: the lower half of each ZMMn is YMMn, and the lower half of each YMMn is XMMn. There’s no direct way register access for just the upper half of YMMn, nor does ZMMn have direct 256- or 128-bit access for the thunks of its upper half.

SSE also defines a new status register, MXCSR, that contains flags roughly parallel to the arithmetic flags in RFLAGS (along with floating-point flags in the x87 status word). SSE also introduces a load/store instruction pair for manipulating it (LDMXCSR and STMXCSR).

AVX-512 also introduces eight opmask registers, k0 through k7. k0 is a special case that behaves much like the “zero” register on some RISC ISAs: it can’t be stored to, and loads from it always produce a bitmask of all ones.

Errata: The table above includes AVX-512, which isn’t available on any AMD CPUs as of 2020. I’ve updated the counts below to only include SSE and AVX2-introduced registers.

Registers in this group: 33.

Running total: 133.

Bounds registers

Intel added these with MPX, which was intended to offer hardware-accelerated bounds checking. Nobody uses it, since it doesn’t work very well. But x86 is eternal and slow to fix mistakes, so we’ll probably have these registers taking up space for at least a while longer:

BND0 — BND3: Individual 128-bit registers, each containing a pair of addresses for a bound.
BNDCFG: Bound configuration, kernel mode.
BNDCFU: Bound configuration, user mode.
BNDSTATUS: Bound status, after a #BR is raised.

Registers in this group: 7.

Running total: 140.

Debug registers

These are what they sound like: registers that aid and accelerate software debuggers, like GDB.

There are 6 debug registers of two types:

DR0 through DR3 contain linear addresses, each of which is associated with a breakpoint condition.
DR6 and DR7 are the debug status and control registers. DR6’s lower bits indicate which debug conditions were encountered (upon entering the debug exception handler), while DR7 controls which breakpoint addresses are enabled and their breakpoint conditions (e.g., when a particular address is written to).

What about DR4 and DR5? For reasons that are unclear to me, they don’t (and have never) existed⁹. They do have encodings but are treated as DR6 and DR7, respective, or produce an #UD exception when CR4.DE[bit 3] = 1.

Registers in this group: 6.

Running total: 146.

Control registers

x86-64 defines a set of control registers that can be used to manage and inspect the state of the CPU.

There are 16 “main” control registers, all of which can be accessed with a MOV variant:

Name	Purpose
CR0	Basic CPU operation flags
CR1	Reserved
CR2	Page-fault linear address
CR3	Virtual addressing state
CR4	Protected mode operation flags
CR5	Reserved
CR6	Reserved
CR7	Reserved
CR8	Task priority register (TPR)
CR9	Reserved
CR10	Reserved
CR11	Reserved
CR12	Reserved
CR13	Reserved
CR14	Reserved
CR15	Reserved

All reserved control registers result in an #UD when accessed, which makes me inclined to not count them in this post.

In addition to the “main” CRn control registers there are also the “extended” control registers, introduced with the XSAVE feature set. As of writing, XCR0 is the only specified extended control register.

The extended control registers use XGETBV and XSETBV instead of a MOV variant.

Registers in this group: 6.

Running total: 152.

“System table pointer registers”

That’s what the Intel SDM calls these⁸: these registers hold sizes and pointers to various protected mode tables.

As best I can tell, there are four of them:

GDTR: Holds the size and base address of the GDT
LDTR: Holds the size and base address of the LDT
IDTR: Holds the size and base address of the IDT
TR: Holds the TSS selector and base address for the TSS

The GDTR, LDTR, and IDTR each seem to be 80 bits in 64-bit modes: 16 lower bits for the size of the register’s table, and then the upper 64 bits for the table’s starting address.

TR is likewise 80 bits: 16 bits for the selector (which behaves identically to a segment selector), and then another 64 for the base address of the TSS¹⁰.

Registers in this group: 4.

Running count: 156.

Memory-type-ranger registers

These are an interesting case: unlike all of the other registers I’ve covered so far, these are not unique to a particular CPU in a multicore chip; instead, they’re shared across all cores¹¹.

The number of MTTRs seems to vary by CPU model, and have been largely superseded by entries in the page attribute table, which is programmed with an MSR¹².

Registers in this group:

Running count: >156.

Model specific registers

Model-specific registers are where things get fun.

Like extended control registers, they’re accessed indirectly (by identifier) through a pair of instructions: RDMSR and WRMSR. MSRs themselves are 64-bits but originated during the 32-bit era, so RDMSR and WRMSR read from and write to two 32-bit registers: EDX and EAX.

By way of example: here’s the setup and RDMSR invocation for accessing the IA32_MTRRCAP MSR, which includes (among other things) that actual number of MTRRs available on the system:

MOV ECX, 0xFE ; 0xFE = IA32_MTRRCAP
RDMSR
; The bits of IA32_MTRRCAP are now in EDX:EAX

RDMSR and WRMSR are privileged instructions, so normal ring-3 code can’t access MSRs directly¹³. The one (?) exception that I know of is the timestamp counter (TSC), which is stored in the IA32_TSC MSR but can be read from non-privileged contexts with RDTSC and RDTSCP.

Two other interesting (but still privileged¹⁴) cases are FSBASE and GSBASE, which are stored as IA32_FS_BASE and IA32_GS_BASE, respectively. As mentioned in the segment register section, these store the FS and GS segment bases on x86-64 CPUs. This makes them targets of relatively frequent use (by MSR standards), so they have their own dedicated R/W opcodes:

RDFSBASE and RDGSBASE for reading
WRFSBASE and WRGSBASE for writing

But back to the meat of things: how many MSRs are there?

Using the standards laid out at the beginning of this post, we’re interested in counting what Intel calls “architectural” MSRs. From the SDM¹⁵:

Many MSRs have carried over from one generation of IA-32 processors to the next and to Intel 64 processors. A subset of MSRs and associated bit fields, which do not change on future processor generations, are now considered architectural MSRs. For historical reasons (beginning with the Pentium 4 processor), these “architectural MSRs” were given the prefix “IA32_”.

According to the subsequent table¹⁶, the highest architectural MSR is 6097/17D1H, or IA32_HW_FEEDBACK_CONFIG. So, the naïve answer is over 6000.

However, there are significant gaps in the documented MSR ranges: Intel’s documentation jumps directly from 3506/DB2H (IA32_THREAD_STALL) to 6096/17D0H (IA32_HW_FEEDBACK_PTR). On top of the empty ranges, there are also ranges that are explicitly marked as reserved, either generally or explicitly for later expansion of a particular MSR family.

To count the actual number of MSRs, I did a bit of pipeline ugliness:

Extract just table 2-2 from Volume 4 of the SDM (link):

 $ pdfjam 335592-sdm-vol-4.pdf 19-67 -o 2-2.pdf

Use pdftotext to convert it to plain text and manually trim the next table from the last page:

 $ pdftotext 2-2.pdf table.txt # edit table.txt by hand

Split the plain text table into a sequence of words, filter by IA32_, remove cruft, and do a standard sort-unique-count:

 $ tr -s '[:space:]' '\n' < table.txt \ | grep 'IA32_' \ | tr -d '.' \ | sed 's/\[.*$//' \ | sort | uniq | wc -l
  404

(Output preserved for posterity here).

That pipeline left a bit of cruft towards the end thanks to quoted variants, so I count the actual number at 400 architectural MSRs. That’s a lot more reasonable than 6096!

Registers in this group: 400

Running count: >556.

Other bits and wrapup

The footnotes at the bottom of this post cover most of my notes, but I also wanted to dump some other resources that I found useful while discovering registers:

sandpile.org has a nice visualization of many of the architectural MSRs, including field breakdowns.
Vol. 3A § 8.7.1 (“State of the Logical Processors”) of the Intel SDM has a useful list of nearly all of the registers that are either unique to or shared between x86-64 cores.
The OSDev Wiki has collection of helpful pages on various x86-64 registers, including a great page on the behavior of the segment base MSRs.

All told, I think that there are roughly 557 registers on the average (relatively recent) x86-64 CPU core. With that being said, I have some peripheral cases that I’m not sure about:

Modern Intel CPUs use integrated APICs as part of their SMT implementation. These APICs have their own register banks which can be memory-mapped for reading and potential modification by an x86 core. I didn’t count them because (1) they’re memory mapped, and thus behave more like mapped registers from an arbitrary piece of hardware than CPU registers, and (2) I’m not sure whether AMD uses the same mechanism/implementation.
The Intel SDM implies that Last Branch Records are stored in discrete, non-MSR registers. AMD’s developer manual, on the other hand, specifies a range of MSRs. As such, I didn’t attempt to count these separately.
Both Intel and AMD have their own (and incompatible) virtualization extensions, as well as their own enclave/hardened execution extensions. My intuition is that each introduces some additional registers (or maybe just MSRs), but their vendor-specificity made me inclined to not look too deeply.

Information on these (and any other) registers would be deeply appreciated.

Discussions: Reddit

Read the original article

tosh

Karma: 173211

@Hacker__News
@hacker._news

Comments

By noelwelsh 2026-02-1416:301 reply

This is how many registers the ISA exposes, but not the number of registers actually in the CPU. Typical CPUs have hundreds of registers. For example, Zen 4 's integer register file has 224 registers, and the FP/vector register file has 192 registers (per Wikipedia). This is useful to know because it can effect behavior. E.g. I've seen results where doing a register allocation pass with a large number of registers, followed by a pass with the number of registers exposed in the ISA, leads to better performance.

By saagarjha 2026-02-150:371 reply

What compilers do this?

By noelwelsh 2026-02-1510:451 reply

One writeup I know about is: "Smlnj: Intel x86 back end compiler controlled memory."

By solarexplorer 2026-02-1517:101 reply

What you describe sounds counter-intuitive. And the paper you cite seems to suggest an ISA extension to increase the number architected (!) registers. That is something very different. It makes most sense in VLIW architectures, like the ones described in the paper. Architectures like x86 do hardware register renaming (or similar techniques, there are several) to be able to exploit as much instruction level parallelism as possible. That is why I find you claim hard to believe. VLIW architectures traditionally provide huge register sets and make less use of transparent register renaming etc, that part is either explicit in the ISA or completely left to the compiler. These are very different animals than our good old x86...

By noelwelsh 2026-02-1521:391 reply

I'm not sure we're talking abou the same paper. Here's the one I'm referring to:

https://smlnj.org/compiler-notes/k32.ps

E.g. "Our strategy is to pre-allocate a small set of memory locations that will be treated as registers and managed by the register allocator."

There are more recent publications on "compiler controlled memory" that mostly seem to focus on GPUs and embedded devices.

By BeeOnRope 2026-02-175:54

Relevant section:

> Compiler controlled memory: There is a mechanism in the processor where frequently accessed memory locations can be as fast as registers. In Figure 2, if the address of u is the same as x, then the last load μ-op is a nop. The internal value in register r25 is forwarded to register r28, by a process called write-buffer feedforwarding. That is to say, provided the store is pending or the value to be stored is in the write-buffer, then loading form a memory location is as fast as accessing external registers.

I think it over-sells the benefit. Store forwarding is a thing, but it does not erase the cost of the load or store, at least certainly on the last ~20 years of chips and I don't think on the PII (the target of the paper) either.

The load and store still effectively occur in terms of port usage, so the usual throughput, etc, limits apply. There is a benefit in latency of a few cycles. Perhaps also the L1 cache access itself is omitted, which could help for bank conflicts, though on later uarches there were few to none of these so you're left with perhaps a small power benefit.

By Someone 2026-02-1417:231 reply

FTA: “For design reasons that are a complete mystery to me, the MMX registers are actually sub-registers of the x87 STn registers”

I think the main argument for doing that was that it meant that existing OSes didn’t need changes for the new CPU. Because they already saved the x87 registers on context switch, they automatically saved the MMX registers, and context switches didn’t slow down.

It also may have decreased the amount of space needed, but that difference can’t have been very large, I think

By adrian_b 2026-02-1616:31

By "existing OSes", that really means Microsoft Windows, other OSes would not have had any problems with the negligible update required to save and restore more registers.

During many decades, Intel has introduced a lot of awful workarounds in their CPUs for the only reason that Microsoft was too lazy to update their OS so the newer better CPUs had to be managed by the OS exactly in the same way as the old worse CPUs, even if that moved inside the CPUs various functions that can be done much more efficiently by the OS, so their place is not inside the CPU.

So the MMX registers were aliased over the FPU registers because in this way the existing MS Windows saved them automatically at thread switching. Eventually the limitations of MMX were too great, and due to competitive pressure from AMD (3DNow!) and Motorola (AltiVec), Intel and Microsoft were forced to transition to SSE in 1999, for which a couple of new save and restore instructions have been added and used by the OS, allowing an increase in the number and size of registers.

By rep_lodsb 2026-02-1420:023 reply

Nitpick (footnote 3): "64-bit kernels can run 32-bit userspace processes, but 64-bit and 32-bit code can’t be mixed in the same process. ↩"

That isn't true on any operating system I'm aware of. If both modes are supported at all, there will be a ring 3 code selector defined in the GDT for each, and I don't think there would be any security benefit to hiding the "inactive" one. A program could even use the LAR instruction to search for them.

At least on Linux, the kernel is perfectly fine with being called from either mode. FASM example code (with hardcoded selector, works on my machine):

    format elf executable at $1_0000
    entry start
    
    segment readable executable
    
    start:  mov     eax,4                   ;32-bit syscall# for write
            mov     ebx,1                   ;handle
            mov     ecx,Msg1                ;pointer
            mov     edx,Msg1.len            ;length
            int     $80
    
            call    $33:demo64
    
            mov     eax,4
            mov     ebx,1
            mov     ecx,Msg3
            mov     edx,Msg3.len
            int     $80
            mov     eax,1                   ;exit
            xor     ebx,ebx                 ;status
            int     $80
    
    use64
    demo64: mov     eax,1                   ;64-bit syscall# for write
            mov     edi,1                   ;handle
            lea     rsi,[Msg2]              ;pointer
            mov     edx,Msg2.len            ;length
            syscall
            retfd                           ;return to caller in 32 bit mode

    Msg1    db      "Hello from 32-bit mode",10
    .len=$-Msg1
    
    Msg2    db      "Now in 64-bit mode",10
    .len=$-Msg2
    
    Msg3    db      "Back to 32 bits",10
    .len=$-Msg3

By ronsor 2026-02-1420:52

This is also true on Windows. Malware loves it! https://encyclopedia.kaspersky.com/glossary/heavens-gate/

By bonzini 2026-02-1420:55

Isn't it how recent Wine runs 32-bit programs?

By josephh 2026-02-1421:151 reply

Much like there is 64-bit "code", there is also 32-bit "code" that can only be executed in the 32-bit (protected) mode, namely all the BCD, segment-related, push/pop-all instructions that will trigger an invalid opcode exception (#UD) when executed under long mode. In that strictest sense, "64-bit and 32-bit code can’t be mixed".

By jcranmer 2026-02-151:141 reply

x86 has (not counting the system-management mode stuff) 4 major modes: real mode, protected mode, virtual 8086 mode, and IA-32e mode. Protected mode and IA-32e mode rely on the bits within the code segment's descriptor to figure out whether or not it is 16-bit, 32-bit, or 64-bit. (For extra fun, you can also have "wrong-size" stack segments, e.g., 32-bit code + 16-bit stack segment!)

16-bit and 32-bit code segments work almost exactly in IA-32e mode (what Intel calls "compatibility mode") as they do in protected mode; I think the only real difference is that the task management stuff doesn't work in IA-32e mode (and consequently features that rely on task management--e.g., virtual-8086 mode--don't work either). It's worth pointing out that if you're running a 64-bit kernel, then all of your 32-bit applications are running in IA-32e mode and not in protected mode. This also means that it's possible to have a 32-bit application that runs 64-bit code!

But I can run the BCD instructions, the crazy segment stuff, etc. all within a 16-bit or 32-bit code segment of a 64-bit executable. I have the programs to prove it.

By adrian_b 2026-02-1616:40

Yes, but you transition between the 2 modes with far jumps, far calls or far returns, which reload the code segment.

Without passing through a far jump/call/return, you cannot alternate between instructions that are valid only in 32-bit mode and instructions that are valid only in 64-bit mode.

Normally you would have 32-bit functions embedded in a 64-bit main program, or vice-versa. Unlike normal functions, which are invoked with near calls and end in near returns, such functions would be invoked with far calls and they would end in far returns.

However, there is no need to write now such hybrid programs. The 32-bit compatibility mode exists mainly for running complete legacy programs, which have been compiled for 32-bit CPUs.