DDR4 Sdram – Initialization, Training and Calibration

2026-03-106:029827www.systemverilog.io

A detailed tutorial on DDR4 SDRAM Initialization, Training and Calibration. Exploring topics such as Read/Write Training, ZQ Calibration, Vref Training, Read Centering, Write Centering, Write Leveling…

When a device with a DRAM sub-system is powered up, a number of things happen before the DRAM gets to an operational state. The following state-machine from the JEDEC specification shows the various states the DRAM transitions through from power-up.

Figure 1: DDR4 State Machine

Figure 1: DDR4 State Machine (Source: Micron Datasheet)

In essence, the initialization procedure consists of 4 distinct phases

  • Power-up and Initialization
  • ZQ Calibration
  • Vref DQ Calibration
  • Read/Write Training (a.k.a Memory Training or Initial Calibration)

To better understand the following sections, let's assume you have a system which looks like this - An ASIC/FPGA/Processor with 1 DIMM module.

Initialization States

Figure 2: Example System

Initialization

Example System

Figure 2: Initialization States (Source: Micron Datasheet)

Power-up and initialization is a fixed well-defined sequence of steps. Typically, when the system is powered up and the controller in the ASIC/FPGA/Processor is removed out of reset, it automatically performs the power-up and initialization sequence. Here's a super-simplified version of what the controller does. For exact details refer to section 3.3 in the JESD79-49A specification.

  1. Apply power to the DRAM
  2. De-assert RESET and activate ClockEnable CKE
  3. Enable clocks CK_t/CK_c
  4. Issue MRS commands and load the Mode Registers [The mode registers are loaded in a specific sequence]
  5. Perform ZQ Calibration [ZQCL]
  6. Bring the DRAM into IDLE state

At this point the DRAMs on the DIMM module understand what frequency they have to operate at, what the CAS Latency (CL), CAS Write Latency (CWL) and few other timing parameters are.

ZQ Calibration

ZQCL

Figure 4: ZQCL (Source: Micron Datasheet)

ZQ Calibration is related to the data pins DQ. To understand what ZQ calibration does and why it is required, we need to first look at the circuit behind each DQ pin. Remember, the DQ pin is bidirectional. It is responsible for sending data back during reads and receiving data during writes.

DQ calibration block

Figure 5: DQ calibration block

Now, if you look within a DRAM, the circuit behind every DQ pin is made up of a set of parallel 240Ω resistor legs, as shown in Figure 4. Because of the nature of CMOS devices, these resistors are never exactly 240Ω. The resistance is even affected due to voltage and temperature changes. So, they are made tunable.

In order to tune these resistors to exactly 240Ω, each DRAM has

  • a special block called DQ calibration control block and
  • a ZQ pin to which an external precision (+/- 1%) 240Ω resistor is connected.

This external precision resistor is the "reference" and it remains at 240Ω at all temperatures. When a ZQCL command is issued during initialization, this DQ calibration control block gets enabled and it produces a tuning value. This value is then copied over to each DQ's internal circuitry.

Note!

The above explanation is a quick overview of ZQ calibration. If you're satisfied, proceed to the next section. If you're itching for more details, read on.

The 240Ω resistor leg within a DQ circuit is a type of resistor called "Poly Silicon Resistor" and is, typically, slightly larger than 240Ω (Poly silicon resistor is a type of resistor that is compatible with CMOS technology). There are number of p-channel devices that are connected in parallel to this poly-resistor so that it can be tuned exactly to 240Ω.

The figure below zooms into one 240Ω leg of the DQ circuit and shows 5 p-channel devices connected to the poly-resistor. These little transistors are set based on input VOH[0:4].

DQ driver/receiver circuit

Figure 6: DQ driver/receiver circuit (Source: Micron datasheet)

Now, the circuit connected to the DQ calibration control block is essentially a resistor divider circuit with one of the resistors being the poly and the other is the precision 240Ω. When a ZQCL command is issued during initialization, this DQ calibration control block is enabled and an internal comparator within the DQ calibration control block tunes the p-channel devices using VOH[0:4] until the voltage is exactly VDDq/2 (A classic resistor divider). At this point the calibration has been complete and the VOH values are transferred all the DQ pins.

DQ calibration block

Figure 7: DQ calibration block (Source: Micron datasheet)

Next, you may wonder why the DQ pins even have this parallel network of 240Ω resistors in the first place!

Having a bank of parallel 240Ω resistors allows you to tune the drive strength (for READs) and termination resistance (for WRITEs). Every PCB layout is different so this tuning capability is required to improve signal integrity, maximize the signal's eye-size and allow the DRAM to operate at high-speeds.

The signal drive strength from the DRAM can be controlled by setting mode register MR1[2:1]. The termination can be controlled using a combination of RTT_NOM, RTT_WR & RTT_PARK in mode registers MR1, 2 & 5 respectively.

Vref DQ Calibration

VrefDQ Calibration

Figure 8: VrefDQ Calibration (Source: Micron datasheet)

In DDR4 the termination style of the data lines (DQ) was changed from CTT (Center Tapped Termination, also called SSTL Series-Stud Terminated Logic) to POD (Pseudo Open Drain). This was done to improve signal integrity at high speeds and to save IO power. This is not the first of its kind, GDDR5 (the graphics DRAM) uses POD as well.

SSTL in DDR3 vs POD in DDR4

Figure 9: SSTL in DDR3 vs POD in DDR4 (Source: Micron handbook)

What this means is, in DDR3 Vdd/2 is used as the voltage reference to decide if the DQ signal is 0 or 1. Take another look at the left-hand side of Figure 9, the receiver is essentially a voltage divider circuit.

But in DDR4 there is no voltage divider circuit at the receiver. It instead has an internal voltage reference which it uses to decide if the signal on data lines (DQ) is 0 or 1. This voltage reference is called VrefDQ. The VrefDQ can be set using mode registers MR6 and it needs to be set correctly by the memory controller during the VrefDQ calibration phase.

Read/Write Training

At this point the initialization procedure is complete and the DRAMs are in IDLE state, but the memory is STILL not operational. The Controller and PHY have to perform a few more important steps before data can be reliably written-to or read-from the DRAM. This important phase is called Read/Write Training (or Memory Training or Initial Calibration) wherein the controller (or PHY)

  1. Runs algorithms to align clock [CK] and data strobe [DQS] at the DRAM
  2. Runs algorithms and figures out the correct read and write delays to the DRAM
  3. Centers the data eye for reads
  4. Reports errors if the signal integrity is bad and data cannot be written or read reliably

This section is about the following circle in the state machine

Read/Write Training State

Figure 10: Read/Write Training State (Source: Micron handbook)

Why is Read/Write Training Required?

Let's take a closer look at our example system. The picture below shows how the data signals and address/commmand signals are connected between the ASIC/Soc/Processor and the DRAMs on the DIMM.

  • The Data and DataStrobe (DQ & DQS) are connected to each memory in a star topology because each memory is connected to a different portion of the 72 data lines
  • The Clock, Command & Address lines (A, CK, CKE, WE, CSn) on a DIMM are connected using a technique called fly-by routing topology. This is done because all DRAMs on the DIMM share the same address lines and fly-by routing is required to achieve better signal integrity and the high speeds.

Example System in Detail

Figure 11: Example System in Detail

So, from the ASIC/Processor's point of view each DRAM memory on the DIMM is located at a different distance. Or from the DIMM's point of view, the skew between clock and data is different for each DRAM on the DIMM.

The DRAM is a fairly dumb device. Say you intend to do a WRITE operation, during initialization you tell the DRAM what the CAS Write Latency is by programming one of its Mode Registers (CWL is the time delay between the column address and data at the inputs of a DRAM), and you have to honor this timing parameter at all times. The memory controller needs to account for the board trace delays and the fly-by routing delays and launch Address and Data with the correct skew between them so that the Address and Data arrive at the memory with CWL latency between them.

For example, if you program the CAS Write Latency to 9, once the ASIC/uP launches the Column Address, it will need to launch the different data bits at different times so that they all arrive at the DRAMs at a CWL of 9.

Something similar to the above needs to be done for READs as well. Since each DRAM on the DIMM is located at a different distance, when a READ is issued each DRAM on the DIMM will see the READ command at different times and subsequently the data from each DRAM arrives at the ASIC/Processor at different times. During Initial Calibration, the ASIC/Processor figures out what the delays from each of the DRAMs are and trains its internal circuitry accordingly so that it latches the data from the various DRAMs at the right moment.

For Read/Write Training, the Controller/PHY IPs typically offer a number of algorithms. The most common ones are:

  1. Write Leveling
  2. MPR (Multi-Purpose Register) Pattern Write
  3. Read Centering
  4. Write Centering

The following sections go into more detail about each of these algorithms.

Write Leveling

When writing to a DRAM an important timing parameter that cannot be violated is tDQSS. tDQSS is the position of the DataStrobe (DQS) relative to Clock (CK). tDQSS has to be within a tDQSS(MIN) and tDQSS(MAX) as defined in the spec. If tDQSS is violated and falls outside the range, wrong data may be written to the memory.

Since the Clock to Data/DataStrobe skew is different for each DRAM on the DIMM, the memory controller needs to train itself so that it can compensate for this skew and maintain tDQSS at the input of each DRAM on the DIMM.

When you enable write-leveling in the controller, it does the following steps:

  1. Does an Mode Register write to MR1 to set bit 7 to 1. This puts the DRAM into write-leveling mode. In write-leveling mode, when the DRAM sees a DataStrobe (DQS), it uses it to sample the Clock (CK) and return the sampled value back to the controller through the DQ bus.
  2. The controller then sends a series of DQS pulses. Since the DRAM is in write-leveling mode, it samples the value of CK using DQS and returns this sampled value (either a 1 or 0), back to the controller, through the DQ bus.
  3. The controller then
    • looks at the value of the DQ bit that is returned by the DRAM
    • either increments or decrements the DQS delay and
    • launches the next set of DQS pulses after some time
  4. The DRAM once again samples CK and returns the sampled value through DQ bus
  5. Steps 2 to 4 are repeated until the controller sees a 0-to-1 transition. At this point the controller locks the DQS delay setting and write-leveling is achieved for this DRAM device.
  6. Steps 2 to 5 are then repeated for each DQS for the whole DIMM to complete the write-leveling procedure
  7. The DRAMs are finally removed out of write-leveling mode by writing a 0 to MR1[7]

The figure below shows the write-leveling concept.

Write Leveling

Figure 12: Write Leveling (Source: Micron Datasheet)

MPR Pattern Write

MPR (Multi Purpose Register) Pattern Write isn't exactly a calibration algorithm. It is typically a step that is performed before Read Centering and Write Centering.

DDR4 DRAMs contain four 8-bit programmable registers called MPR registers that are used for DQ bit training (i.e., Read and Write Centering). MPR access mode is enabled by setting Mode Register MR3[2] = 1. When this mode is enabled READs and WRITEs issued to the DRAM are diverted to the Multi Purpose Register instead of the memory banks.

MPR Read/Write

Figure 13: MPR Read/Write (Source: Micron Datasheet)

Read Centering

The purpose of read centering is to train the internal read capture circuitry in the controller (or PHY) to capture the data in the center of the data eye. The memory controller (or PHY)

  1. Enables bit 2 in mode register MR3 so that the DRAM returns data from the Multi Purpose Register (MPR) instead if the DRAM memory.
  2. Then initiates a continuous stream of READs. The memory returns the pattern that was written in the previous MPR Pattern Write step. Let's assume this pattern is an alternating 1-0-1-0-...
  3. While the READs are going on, the internal read capture circuitry either increases of decreases an internal read delay register to find the left and right edge of the data eye.
  4. When the edges of the eye are detected, the read delay registers are set appropriately to ensure the data is captured at the eye center.
  5. The above steps are repeated for each of the DQ data bits

Write Centering

Similar to the read centering step, the purpose of write centering is to set the write delay for each data bit so that write data is centered on the corresponding write strobe edge at the DRAM device.

During write centering the PHY does the following WRITE-READ-SHIFT-COMPARE loop continuously

  1. Initiates a continuous stream of WRITEs and READs
  2. Incrementally changes write delay of the data bits
  3. Compares the data read back to the data written

From the above loop the PHY can determine for what write-delay range it reads back good data, and hence it can figure out the left and write edges of the write-data eye. Using this dat,a the DQ is centered to the DQS for writes.

Periodic Calibration

In a device such as a network switch or router, there could be changes in Voltage and Temperature during its course of operation. To keep the signal integrity and data access reliable, some of the parameters that were trained during initialization and read/write training have to be re-run. Memory controller and PHY IPs typically provide the following two periodic calibration processes.

  • Periodic ZQ - Also known as ZQ Calibration Short (ZQCS). It is used to run ZQ calibration periodically to tune the 240Ω resistor that was described earlier.
  • Periodic Read Centering - To re-calculate read delays and other read related parameters

Enabling periodic calibration is optional because if you know your device will be deployed in stable temperature conditions, then the initial ZQ calibration and read/write training is sufficient.

Typically, the memory controller or PHY allow you to set a timer and enable periodic calibration through their registers. Once the timer is set, periodic calibration is run every time the timer expires.

In a Nutshell

There are 4 steps to be completed before the DRAM can be used

  1. Power-up and initialization
  2. ZQ Calibration
  3. Vref DQ Calibration
  4. Read/Write Training

Once this is done system is officially in IDLE and operational. You may need to enable periodic calibration depending upon the conditions in which your device is deployed.

Reference

For questions or comments on this article, please use the following link.


Read the original article

Comments

  • By nsteel 2026-03-130:121 reply

    Implementing DDR3 training for our packet queuing chip (custom memory controller) was my first project at work. We had originally hoped to use the same training params for all parts. That wasn't reliable even over a small number of testing systems in the chamber. DDR3 RAM parts were super cheap compared to what we had used in previous generations, and you get what you pay for with a huge amount of device variation. So we implemented a relatively long training process to be run on each device during our board testing, and saved those per-lane skews. But we found the effects of temperature, and particularly system noise, were too great once the system was sending full-rate traffic. (The training had to be done one interface at a time, with pedestrian data-rates). We then ended up with a quick re-training pass to re-center the eyes. It still wasn't perfect - slower ram chips (with smaller eyes) would report ECC correctables when all interfaces were doing worst-case patterns at temperature extremes. We spent a lot of time making those interfaces robust, and ended up relying more on ECC than we had intended. But those chips have been shipping ever since and will have seen traffic from most of us.

    • By bri3d 2026-03-133:272 reply

      You played in hard mode in a weird sense; more modern DDR versions are in a backwards sense "easier" if you're buying the IP, because a lot of the training has moved to boot time and is handled by the vendor IP rather than needing to be run during burn-in using some proprietary toolkit or self-test tool.

      It's just as arcane and weird, but if you buy one of the popular modern packages for DDR4/5 like DesignWare, more and more training is accomplished using opaque blob firmware (often ARC) loaded into an embedded calibration processor in the DDR controller itself at boot time rather than constants trained by your tooling or the vendor's.

      • By halifaxbeard 2026-03-1311:19

        Wow, I was spoiled building firmware for my ARM boards then (building, not developing).

        Marvell has a source available DDR driver that actually takes care of training on a few of their platforms! https://github.com/MarvellEmbeddedProcessors/mv-ddr-marvell

      • By nsteel 2026-03-138:13

        I don't know if this is still the case, but back then the likes of Synopsys charged a lot of money for what was very limited controller functionality; you were stuck with their frustrating support channels and generally dumpster fire firmware. Our controller was fully custom to our needs, supporting more optimum refresh schemes tightly integrated with our application, and multiple memory protocols (not just DDR3), and I don't remember what else.

        At least we were able to modify the training algorithms and find the improvements, rather than being stuck with the usual vendor "works for us" response. Especially with something like commodity DDR, where our quantities don't command much clout. But it was a bit of an ordeal and may have contributed to us buying in a controller for our next gen (not DDRx). But I think we're going the other way again after that experience..!

  • By MisterTea 2026-03-1219:285 reply

    From my understanding, memory training is/was a closely held secret of memory makers and EDA IP houses who sold memory controller IP to all the chip vendors. This in turn makes fully open motherboard firmware almost impossible as no one can write code for memory training to bring up the chip. That piece of code has to be loaded as a blob - if you can get the blob.

    • By Aurornis 2026-03-1222:011 reply

      I think you're mixing different concepts. JEDEC doesn't define DDR4 training procedures so there isn't a secret that's being withheld. Everyone who implements a DDR4 controller has to develop and implement a training procedure to meet the specifications.

      On a DDR4 motherboard the training would occur between the memory controller and the DDR4 RAM. The proprietary blob you need would include the communication with the memory controller and instructions to handle the training for that specific memory controller.

      There are several open source DDR4 controllers in different states of usability. They have each had to develop their own implementations.

      • By bri3d 2026-03-133:312 reply

        What you're saying is true, but the OP has a point too.

        What's basically happening is that as things get faster the lifetime of training data decreases because the system becomes more sensitive to environmental conditions, so training procedures which were previously performed earlier in the manufacturing cycle are now delegated to the runtime, so the system migrates from data to code.

        Previously, you or the vendor would provide tools and a calibration system which would infer some values and burn a calibration, and then load it during early boot. More recently, the runtime is usually a combination of a microcontroller and fixed-function blocks on the DDR PHY, and that microcontroller's firmware is usually supplied as a generic blob by the vendor. The role of this part of the system keeps growing. The system has gotten a bit more closed; it's increasingly moved from "use this magic tool to generate these magic values, or read the datasheets and make your own magic tool" to "load this thing and don't ask questions."

        • By Aurornis 2026-03-1313:38

          The parent commenter was mixing two concepts together.

          DDR4 training is not defined. It’s vendor-implemented.

          If you want to work with a vendor’s memory controller chip, you need the documentation for that chip.

          So the secret isn’t memory training (the topic of this article) it’s just proprietary chips on motherboards. Memory training is only one of many things that have to be reverse engineered or known for an open firmware.

        • By kvemkon 2026-03-1310:30

          > environmental conditions

          Shouldn't it be: no more negligible manufacturing / assembly tolerances instead? I mean, when I turn PC on, the temperature of all components is 20 C, the training is done at almost this temperature. But then the PC can work for months with much more higher memory controller and DRAM chips temperatures.

    • By nopurpose 2026-03-1222:131 reply

      I remember listening to Oxide & Friends (or it was On the Metal?) podcast few years ago and had an impression they wrote their own training code.

      • By p_l 2026-03-138:50

        It's a more available option on AMD chips, intel AFAIK kept it a secret blob.

        Ultimately oxide got to run customised firmware deal and AFAIK even got custom PSP firmware

    • By Joel_Mckay 2026-03-1220:32

      It is usually the IP licensing, as spinning a board isn't always complex.

      Note, it is actually easier to profile a known dram chip set bonded to the PCB. A lot of products already do this like phones, tablets, and thin laptops.

      Where as SSD drives being a wear item, should be removable by end users. =3

    • By brcmthrowaway 2026-03-1219:476 reply

      Why do we need training?

      • By namibj 2026-03-1220:591 reply

        Because DDR3/4/5 dies are made to a price with half to three quarters of their IO pins shared between the dies in parallel on a rank of a channel, and for capacity often up to around 6 ranks per channel. E.g. high capacity server DDR4 memory, say on AMD SP3, may have 108 dies on each of 8 channels of a socket.

        So if you can move complexity over to the controller you can spend 100:1 ratio in unit cost. So you get to make the memory dies very dumb by e.g. feeding a source synchronous sampling clock that's centered on writes and edge aligned on reads leaving the controller to have a DLL master/slave setup to center the clock at each data group of a channel and only retain a minimal integer PLL in the dies themselves.

        • By Neywiny 2026-03-1222:191 reply

          You need to train whether you're on one die or 100. It's about your per bit skew and PVT

          • By namibj 2026-03-131:451 reply

            Yes, I was just pointing out why we choose to make the memory chips so fragile/dependant on the controller doing all the magic training for them.

            • By Neywiny 2026-03-1311:49

              Well no, your comment answered that we need training because the dies are made cheaply. But no amount of money would prevent the need to train out static and dynamic delays. If it wasn't in the controller it'd be in the memory and the question of why it's needed would still be relevant.

      • By adrian_b 2026-03-1220:44

        A large section of the article is dedicated to the answer for this question.

      • By juancn 2026-03-1220:39

        Imprecision in manufacturing (adjust resistor values), different trace lengths (speed of light differences for parallel signals), etc... it's in the article.

      • By numpad0 2026-03-1221:07

        Because loading XMP profile won't suffice and they have to tune the parameters further to be able to actually run the sticks

      • By phendrenad2 2026-03-1222:04

        Because when you change a PCB trace from 0 to 1 or 1 to 0, the slope of the signal as it changes from gnd to v+ (the signal voltage) or v+ to ground isn't perfect, and that slope is highly affected by the various pieces of metal and silicon and fiberglass that make up the board and the chips. The shape and topology of the PCB trace matters, as do slight imperfections in the solder, PCB material, the bond wires inside the chips, etc. These effectively create resistors/capacitors/inductors that the designer didn't intend, which effect the slope of the 0->1 1->0 changes. So for these high-speed signals, chip designers started adding parameters to tweak the signal in real-time, to compensate for these ill effects. Some parameters include a slight delay between the clock and data signals, to account for skew. Voltage adjustement to avoid ringing (changing v+). Adjusting the transistor bias to catch level transitions more accurately. Termination resistance adjustment, to dampen reflections. And on top of all that, some bits will still be lost but because these protocols are error-correcting, this is acceptable loss.

        This is how people were able to send ethernet packets over barbed wire. Many bits are lost, but some get through, and it keeps trying until the checksums all pass.

    • By varispeed 2026-03-1220:50

      > as no one can write code for memory training to bring up the chip

      Surely someone can do it, but it's probably too niche to do. The licensing fee is probably cheaper than corporation spinning the board and reverse engineer it and for hobbyists lower tier memory likely was fine.

      That said given that such technology has become so much more accessible (you can certainly create FPGA board and wire it up to DDR4 using free tools and then get board made in China), it's probably a matter of time someone will figure this out.

  • By robotnikman 2026-03-1223:30

    Wow, what goes on in a RAM module is so much more complex than I thought.

HackerNews