Intel Meteor Lake Architecture

2023-09-259:49108112hothardware.com

Intel’s Meteor Lake architecture for mobile PCs will be arriving soon and to say that Meteor Lake is the most important shift in Intel’s design and


Intel Meteor Lake Architecture Deep Dive: Tiles, New Cores And AI For All

Intel’s Meteor Lake architecture for mobile PCs will be arriving soon and to say that Meteor Lake is the most important shift in Intel’s design and manufacturing approach could be an understatement. In fact, Intel has called Meteor Lake the largest architecture shift in the last 40 years, and it will influence designs for a decade to come.
intel meteor lake pillars
Intel has outlined four key pillars for Meteor Lake. First, it is designed to be the most power efficient client processor in the company's history. Second, it will be Intel’s first consumer CPU to deliver a dedicated AI engine at scale. Third, Intel is targeting a leap in graphics performance along with power efficiency. Finally, it will be the debut of the Intel 4 process, at least in part.
intel meteor lake disaggregation
Meteor Lake is the company’s first truly disaggregated consumer chip, and its development has been challenging. Disaggregation means that rather than using a single monolithic die to house the CPU cores, integrated GPU, I/O features, and other uncore “stuff,” various engines are instead broken out into multiple chiplets that Intel calls Tiles. This is an attractive approach for many reasons. Dies can only be made so large, while remaining economically and physically feasible. Current processes are limited by reticle sizes during the lithography etching, but there’s more to consider than just this (more or less) hard limit. A larger die not only has more “wasted” space around the edges of a round 300mm wafer, but is also more likely to have debilitating defects, and that has a direct negative impact on yields. Smaller chips let manufacturers extract more from each wafer, maximizing the potential value of each.
intel meteor lake foveros

Breaking up a design into smaller chiplets has its own drawbacks, of course. Packaging it all together becomes a lot more complex, no matter how you slice it. If we disregard 3D V-Cache for a moment, AMD arranges chiplets around each other on a single PCB for its processors, whereas Intel opts to stack silicon directly using its Foveros and Embedded Multi-die Interconnect Bridge (EMIB) technologies. Foveros and EMIB are similar, but distinct advanced packaging techniques that Intel has been using in some products for a few years now.
intel emib packaging

EMIB debuted first in 2017 in the Stratix 10. This was a field-programmable grid array (FPGA) chip, but the technology has matured into high volume production and is used by Sapphire Rapids today. EMIB uses 55um pitch interconnects to mount die atop the embedded silicon bridge.
intel foveros advantage

Foveros allows for active-on-active “3D” stacking with greater complexity. Foveros was introduced with Lakefield in 2020, where it allowed Intel to layer a compute die with PoP DRAM on top of a base die that is then mounted on the package substrate.
intel meteor lake tile architecture
Meteor Lake’s design uses four distinct tiles riding atop a base tile assembled using Foveros 3D packaging technology. These are the Compute tile, GPU tile, SOC tile, and IO tile respectively, which gives some insight into the core function of each. There are certainly some nuances to this though, so lets quickly go over the high-level attributes of each before we dig deeper.
intel meteor lake compute tile overview

The Compute tile is where most of the processors cores reside. Most you may ask? We’ll get to that. The Compute tile features a mix of the P-cores and E-cores we’re used to from Alder Lake and Raptor Lake, with some microarchitectural improvements. The Compute tile is built using the Intel 4 process node and is actually the only tile in the system directly fabbed by Intel.
intel graphics tile overview

The GPU tile is next up. As expected, this contains Intel’s Arc Graphics architecture, specifically Xe-LPG. This is fabbed by TSMC on its N5 process and designed to deliver an approximate 2x performance per watt improvement over the 12th gen Xe graphics. The Graphics tile does not house the Media Engine, though. That has been separated out to reside on the SOC tile, along with the display interfaces.
intel meteor lake soc tile overview
The SOC tile is the home of a wide variety of functions across two scalable fabrics. The North side features the Network on Chip (NOC) fabric linking high performance devices while the South side has an efficient PCIe-based IO fabric, with an IOC bridge linking the two. Beyond connectivity and the media/display engines, it also houses the NPU AI engine, memory controller, and two special E-Cores of its own. The IO tile acts as an extension of the SOC tile’s IO fabric and both are fabbed using TSMC’s N6 process.
intel soc foveros cut lines
As we dive deeper into the architecture, let’s start with the SOC tile. It sits at the center of everything and links directly to the Compute, Graphics, and IO tiles. In effect, these linkages are where “cuts” were made to break out from a traditional monolithic die approach. If we back up for a moment, SOC architectures are deceptively complex, and impacts fab initiated processes and the ability to ramp new nodes. Breaking the chip up like this frees the architects to work with IP appropriate manufacturing processes. The disaggregation ensures that Compute Tile of future products, for example, can quickly ramp to Intel 3 and beyond while the other parts of the SOC, which may not benefit much from more advanced process nodes, can advance at a slower rate.
intel soc initiatives
Meteor Lake's architectural shifts and design philosophies were driven by a few initiatives. The first was to implement the NOC fabric to meet the demands of high-performance devices, while letting the IO fabric provide efficient access for lower power uses. To improve IO efficiency, Intel moved graphics to its own tile, but kept the media and display blocks with the SOC tile. Finally, the power management system needed to be scalable, with control over each tile and even subsystems within the tiles. As an example, this allows the PMC for the Compute Tile to be tuned to the number of P- and E-cores available, while the central PMC on the SOC tile is agnostic to the Compute Tile configuration.
intel soc noc fabric
The “North” NOC fabric is a cache-coherent, un-ordered interface spanning across the SOC Tile, from the Compute Tile to the Graphics Tile. Within the SOC tile, it links high performance devices like the memory controller, LP E-Cores, Neural Processing Unit (NPU), and the Media, Imaging, and Display engines. It also has a local power management unit (P-Unit) for regional control, leading to better efficiency. The inclusion of LP E-cores on the SOC is an interesting decision with significant ramifications, particularly for Thread Director. The design allows this pair of cores to remain active even while the Compute Tile is in a low-power mode or shut off entirely. As a result, this creates a lot of potential to dramatically improve Meteor Lake’s efficiency during common usage patterns.
intel soc io fabric
The “South” IO fabric is ordered, but non-coherent and PCIe-based. It is home to Wi-Fi and Bluetooth, PCI Express connections, Sensing, USB 3/2, Ethernet, the Power Management Controller (PMC), and Security controllers. Intel has broken out the Silicon Security Engine from its traditional Converged Security and Manageability Engine (CSME).
intel io tile
The IO Tile provides additional PCI Express and USB4/Thunderbolt connectivity using the IO fabric. The IO Tile’s positioning alongside the Compute Tile is deliberate, as this configuration effectively extends the surface area of the SOC to make outside connections less crowded.

But there is still much more to cover...


Related content


Read the original article

Comments

  • By thunderbird120 2023-09-2517:132 reply

    Will be interested to see how this first(ish) gen of Intel's disaggregated chips pan out. I've been needing to replace my laptop and these seem like they have the potential to be extremely nice for a mid range machine with long battery life. The new scheduler hierarchy is especially interesting given how much of the physical chip they can avoid powering on at all for most simple tasks. For a lot of light use cases the entire "real" CPU and GPU parts of the silicon can be completely dark since the SOC has two tiny cores to run things and other necessary parts things like the video decode silicon were separated from the GPU.

    • By brucethemoose2 2023-09-2518:111 reply

      Eh, I have a sneaking suspicion the compute dies won't be shut down as much as you'd think, and that there will be some extra power usage from crossing the dies like desktop Ryzen parts (though hopefully not nearly as severe).

      A good Process Lasso config is probably worth the time investment. Instead of "trusting" the scheduler, you could force everything non time sensitive onto the efficiency island, maybe by default.

      • By bwhitty 2023-09-2523:331 reply

        The 3D Foveros packaging technology is critical as it allows some path lengths to be much shorter than if you had to traverse that same path but only in the horizontal 2D plane.

        Very excited to see how this plays out in practice.

        • By brucethemoose2 2023-09-262:21

          I thought Meteor Lake was tiled and 2D? Intel has EMIB and such for very good bridges, but they are still bridges.

          If it is 3D stacked with TSVs, thats a whole other can of worms. AMD's X3D on Ryzen 7000 creates heat/clockspeed issues, and they reportedly canceled a 3D variant of the 7900 GPUs due to similar issues.

    • By transpute 2023-09-264:271 reply

      Disaggregation is also good for risk management of attack surfaces.

      > Intel has broken out the Silicon Security Engine from its traditional Converged Security and Manageability Engine (CSME).

      Good to see Security separated from ME.

      Hopefully the ME no longer has control over IOMMU isolation of devices.

      • By sweetjuly 2023-09-265:561 reply

        I'm not sure I follow. It's almost guaranteed that all chiplets are still on some global system bus just like they were on their monolithic dies. Unless Intel has taken sudden great strides with their SoC security architecture, there are likely still all the old problems (plus a bunch of new fun ones!). Taking it off die is a response to the physical scaling issue, it's really not meant as a security enhancement.

        • By transpute 2023-09-267:14

          Per the article, both IP blocks remain within the same SOC chiplet/tile.

          https://en.wikipedia.org/wiki/Intel_Management_Engine

          > Starting with ME 11, it is based on the Intel Quark x86-based 32-bit CPU and runs the MINIX 3 operating system.

          If "Silicon Security" functions are now executing on a separate CPU/OS, it's an increase in separation from Intel ME functions.

          https://www.anandtech.com/show/20046/intel-unveils-meteor-la...

          > [Meteor Lake] introduces the Intel Silicon Security Engine (ISSE), a dedicated component focused solely on securing things at a silicon level ... The Converged Security and Manageability Engine (CSME) has also been partitioned to further enhance platform security.

  • By transpute 2023-09-264:24

    > The “South” IO fabric is ordered, but non-coherent and PCIe-based. It is home to Wi-Fi and Bluetooth, PCI Express connections, Sensing, USB 3/2, Ethernet, the Power Management Controller (PMC), and Security controllers.

    Does "Sensing" refer to human presence based on camera and radio (Wi-Fi, UWB) imaging?

    https://lkml.org/lkml/2023/2/12/314

      Intel Visual Sensing Controller (IVSC), codenamed "Clover Falls", is a companion chip designed to provide secure and low power vision capability to IA platforms. The primary use case of IVSC is to bring in context awareness. IVSC interfaces directly with the platform main camera sensor via a CSI-2 link and processes the image data with the embedded AI engine. The detected events are sent over I2C to ISH (Intel Sensor Hub) for additional data fusion from multiple sensors.
    
    https://www.techpowerup.com/276114/new-intel-visual-sensing-...

      The company didn't detail how it goes about this, but technologies already exist to combine visual input from the PC's cameras; radio from the PC's antennas, audio from its mic array; to form a picture of its surroundings.
    
    https://community.intel.com/t5/Blogs/Tech-Innovation/Client/...

      With an initial focus on respiration detection, we hope to extend the technology to detect other physical activities as well. Intel Labs will demonstrate an early prototype of breathing detection ... The solution detects the rhythmic change in CSI due to chest movement during breathing ...  The respiration rates gathered by this technology could play an important role in stress detection and other wellness applications.

  • By kristianp 2023-09-2523:331 reply

    Interesting to see how efficient these are for office/coding (e.g. typing into vscode) tasks. Will the cpu tile be off most of the time or will it take some years before applications and OS are tuned to avoid cpu tile wakeups.

    Also how good will the p-cores be compared to previous gen?

    Are the avx-10 instructions going into this generation?

    • By RaisingSpear 2023-09-267:521 reply

      > Are the avx-10 instructions going into this generation?

      Nope. In fact, it probably won't be in client until after Lunar Lake (2025+).

      • By colejohnson66 2023-09-2611:292 reply

        Per the AVX10.1 Instruction Set Reference, p. 1-2 (355989-001US rev 1.0)[0], AVX10 support will begin with 6th gen Xeon processors (based on Granite Rapids), which are due next year. Client support is not called out, so your guess of Lunar Lake (late 2024-early 2025) is probably a good guess.

        [0]: https://cdrdv2.intel.com/v1/dl/getContent/784266

        • By adrian_b 2023-09-2612:16

          No, the instruction set supported by Lunar Lake has been published by Intel a few months ago and it is almost the same as that of Arrow Lake S, i.e. without AVX10 or AVX-512 support (Arrow Lake S supports more instructions than Arrow Lake, e.g. it has SHA-512 secure hash instructions).

          It is likely that Panther Lake, to be launched in 2025, probably in the second half of the year, will be the first Intel CPU supporting the 256-bit subset of the AVX10.2 ISA version.

          The 2024 Granite Rapids will probably be the only Intel CPU supporting the AVX10.1 ISA version (with full 512-bit support), because all the following will start from AVX10.2.

        • By RaisingSpear 2023-09-2612:441 reply

          AVX10.1 is just Sapphire Rapids' AVX-512 renamed, so arguably SPR (and early Alder Lake) already support AVX10.1, just without declaring the relevant CPUID bits.

          Client support depends on E-cores supporting it, and Intel have specified that they'll start with AVX10.2. I don't believe that any core has been announced with AVX10.2 support yet, and we the latest we know is Lunar Lake's ISA support.

          • By adrian_b 2023-09-2613:541 reply

            AVX10.1 is just Granite Rapids' AVX-512 renamed.

            Granite Rapids has a few extra AVX-512 instructions (including those added by Tiger Lake, but omitted in Sapphire Rapids and Alder Lake), so Sapphire Rapids does not support all of AVX10.1. Therefore neither Sapphire Rapids nor Emerald Rapids may turn on the AVX10 CPUID bits.

            Nevertheless, the differences between the AVX-512 instruction sets of Granite Rapids and Sapphire Rapids are small and of little importance.

            • By RaisingSpear 2023-09-2620:22

              Are you sure? I just checked Intel's manual, and nothing above Sapphire Rapids is listed in AVX10.1: https://files.catbox.moe/23ty0y.png

              I couldn't find any new AVX* instructions added to Granite Rapids (I see PREFETCHI and some AMX additions, neither of which fall under the AVX category), and VP2INTERSECT isn't listed under AVX10 or Granite Rapids.

HackerNews