RISC-V Is Sloooow

2026-03-1020:11256260marcin.juszkiewicz.com.pl

About 3 months ago I started working with RISC-V port of Fedora Linux. Many things happened during that time. Triaging I went through the Fedora RISC-V tracker entries, triaged most of them (at the…

About 3 months ago I started working with RISC-V port of Fedora Linux. Many things happened during that time.

Triaging

I went through the Fedora RISC-V tracker entries, triaged most of them (at the moment 17 entries left in NEW) and tried to handle whatever possible.

Fedora packaging

My usual way of working involves fetching sources of a Fedora package (fedpkg clone -a) and then building it (fedpkg mockbuild -r fedora-43-riscv64). After some time, I check did it built and if not then I go through build logs to find out why.

Effect? At the moment, 86 pull requests sent for Fedora packages. From heavy packages like the “llvm15” to simple ones like the “iyfct” (some simple game). At the moment most of them were merged, and most of these got built for the Fedora 43. Then we can build them as well as we follow ‘f43-updates’ tag on the Fedora koji.

Slowness

Work on packages brings the hard, sometimes controversial, topic: speed. Or rather lack of it.

You see, the RISC-V hardware at the moment is slow. Which results in terrible build times — look at details of the binutils 2.45.1-4.fc43 package:

Architecture Cores Memory Build time
aarch64 12 46 GB 36 minutes
i686 8 29 GB 25 minutes
ppc64le 10 37 GB 46 minutes
riscv64 8 16 GB 143 minutes
s390x 3 45 GB 37 minutes
x86_64 8 29 GB 29 minutes

Also worth mentioning is that the current build of RISC-V Fedora port is done with disabled LTO. To cut on memory usage and build times.

RISC-V builders have four or eight cores with 8, 16 or 32 GB of RAM (depending on a board). And those cores are usually compared to Arm Cortex-A55 ones. The lowest cpu cores in today’s Arm chips.

The UltraRISC UR-DP1000 SoC, present on the Milk-V Titan motherboard should improve situation a bit (and can have 64 GB ram). Similar with SpacemiT K3-based systems (but only 32 GB ram). Both will be an improvement, but not the final solution.

We need hardware capable of building above “binutils” package below one hour. With LTO enabled system-wide etc. And it needs to be rackable and manageable like any other boring server. Without it, we can not even plan for the RISC-V 64-bit architecture to became one of official, primary architectures in Fedora Linux.

I still use QEMU

Such long build times make my use of QEMU useful. You see, with 80 emulated cores, I can build the “llvm15” package in about 4 hours. Compare that to 10.5 hours on a Banana Pi BPI-F3 builder (it may be quicker on a P550 one).

btop shows 80 cores being busy

And LLVM packages make real use of both available cores and memory. I am wondering how fast would it go on 192/384 cores of Ampere One-based system.

Future plans

We plan to start building Fedora Linux 44. If things go well, we will use the same kernel image on all of our builders (the current ones use a mix of kernel versions). LTO will still be disabled.

When it comes to lack of speed… There are plans to bring new, faster builders. And probably assign some heavier packages to them.


Read the original article

Comments

  • By kashyapc 2026-03-1110:15

    A couple of corrections (the blog-post is by a colleague, but I'm not speaking for Marcin! :))

    First, we do have a recent 'binutils' build[1] with test-suites in 67 minutes (it was on Milk-V "Megrez") in the Fedora RISC-V build system.

    Second, the current fastest development machine is not Banana Pi BPI-F3. If we consider what is reasonably accessible today, it is SiFive "HiFive P550" (P550 for short) and an upcoming UltraRISC "DP1000", we have access to an eval board. And as noted elsewhere in this thread, in "several months" some RVA23-based machines should be available. (RVA23 == the latest ISA spec).

    FWIW, our FOSDEM talk from earlier this year, "Fedora on RISC-V: state of the arch"[1], gives an overview of the hardware situation. It also has a couple of related poorman's benchmarks (an 'xz' compression test and a 'binutils' build without the test-suite on the above two boards -- that's what I could manage with the time I had).

    Edit: Marcin's RISC-V test was done on StarFive "Vision Five 2". This small board has its strengths (upstreamed drivers), but it is not known for its speed!

    [1] https://riscv-koji.fedoraproject.org/koji/taskinfo?taskID=91...

    [2] Slides: https://fosdem.org/2026/events/attachments/SQGLW7-fedora-on-...

  • By rbanffy 2026-03-1020:2111 reply

    Don't blame the ISA - blame the silicon implementations AND the software with no architecture-specific optimisations.

    RISC-V will get there, eventually.

    I remember that ARM started as a speed demon with conscious power consumption, then was surpassed by x86s and PPCs on desktops and moved to embedded, where it shone by being very frugal with power, only to now be leaving the embedded space with implementations optimised for speed more than power.

    • By izacus 2026-03-1111:36

      If you make a spec that the wider industry cannot effectively implement into quality products, it's the spec that's wrong. And that's true for anything - whether it's RISC-V, ipv6, Matter, USB-C and so on.

      That's what makes writing specs hard - you need people who understand implementation challenges at the table, not dreaming architects and academics.

    • By newpavlov 2026-03-1021:225 reply

      In some cases RISC-V ISA spec is definitely the one to blame:

      1) https://github.com/llvm/llvm-project/issues/150263

      2) https://github.com/llvm/llvm-project/issues/141488

      Another example is hard-coded 4 KiB page size which effectively kneecaps ISA when compared against ARM.

      • By weebull 2026-03-1022:506 reply

        All of those things are solved with modern extensions. It's like comparing pre-MMX x86 code with modern x86. Misaligned loads and stores are Zicclsm, bit manipulation is Zb[abcs], atomic memory operations are made mandatory in Ziccamoa.

        All of these extensions are mandatory in the RVA22 and RVA23 profiles and so will be implemented on any up to date RISC-V core. It's definitely worth setting your compiler target appropriately before making comparisons.

        • By LeFantome 2026-03-1023:201 reply

          Ubuntu being RVA23 is looking smarter and smarter.

          The RISC-V ecosystem being handicapped by backwards compatibility does not make sense at this point.

          Every new RISC-V board is going to be RVA23 capable. Now is the time to draw a line in the sand.

          • By saagarjha 2026-03-119:07

            I’d be kind of depressed if every new RISC-V board was not RVA23 capable.

        • By cmovq 2026-03-112:166 reply

          But RISC-V is a _new_ ISA. Why did we start out with the wrong design that now needs a bunch of extensions? RISC-V should have taken the learnings from x86 and ARM but instead they seem to be committing the same mistakes.

          • By kldg 2026-03-116:45

            I was a bit shocked by headline, given how poorly ARM and x86 compares to RISC-V in speed, cost, and efficiency ... in the MCU space where I near-exclusively live and where RISC-V has near-exclusively lived up until quite recently. RISC-V has been great for RTOS systems and Espressif in particular has pushed MCUs up to a new level where it's become viable to run a designed-from-scratch web server (you better believe we're using vector graphics) on a $5 board that sits on your thumb, but using RISC-V in SBCs and beyond as the primary CPU is a very different ballgame.

          • By wolvoleo 2026-03-113:061 reply

            It is a reduced instruction set computing isa of course. It shouldn't really have instructions for every edge case.

            I only use it for microcontrollers and it's really nice there. But yeah I can imagine it doesn't perform well on bigger stuff. The idea of risc was to put the intelligence in the compiler though, not the silicon.

          • By hun3 2026-03-112:191 reply

            It was kind of an experiment from start. Some ideas turned out to be good, so we keep them. Some ideas turned out not to be good, so we fix them with extensions.

          • By veltas 2026-03-118:32

            Relatively new, we're about 16 years down the road.

          • By pajko 2026-03-118:42

            Intentionally. Back then the guys were telling that everything could be solved by raw power.

        • By sidewndr46 2026-03-111:553 reply

          You're correct but I guess my thoughts are if we're going to wind up with a mess of extensions, why not just use x86-64?

          • By LeFantome 2026-03-114:351 reply

            First, x86-64 also has “extensions” such as avx, avx2, and avx512. Not all “x86-64” CPUs support the same ones. And you get things like svm on AMD and avx on Intel. Remember 3DNow?

            X86-64 also has “profiles” which tell you what extensions should be available. There is x86-64v1 and x86-64v4 with v2 and v3 in the middle.

            RVA23 offers a very similar feature-set to x86-64v4.

            You do not end up with a mess of extensions. You get RVA23. Yes, RVA23 represents a set of mandatory extensions. The important thing is that two RVA23 compliant chips will implement the same ones.

            But the most important point is that you cannot “just use x86-64”. Only Intel and AMD can do that. Anybody can build a RISC-V chip. You do not need permission.

          • By whaleofatw2022 2026-03-112:10

            Because the ISA is not encumbered the way other ISAs are legally, and there are use cases where the minimal profile is fine for the sake of embedded whatever vs the cost to implement the extensions

          • By computably 2026-03-113:051 reply

            > why not just use x86-64?

            Uh, because you can't? It's not open in any meaningful sense.

        • By edflsafoiewq 2026-03-1023:182 reply

          What about page size?

        • By newpavlov 2026-03-1023:222 reply

          >Misaligned loads and stores are Zicclsm

          Nope. See https://github.com/llvm/llvm-project/issues/110454 which was linked in the first issue. The spec authors have managed to made a mess even here.

          Now they want to introduce yet another (sic!) extension Oilsm... It maaaaaay become part of RVA30, so in the best case scenario it will be decades before we will be able to rely on it widely (especially considering that RVA23 is likely to become heavily entrenched as "the default").

          IMO the spec authors should've mandated that the base load/store instructions work only with aligned pointers and introduced misaligned instructions in a separate early extension. (After all, passing a misaligned pointer where your code does not expect it is a correctness issue.) But I would've been fine as well if they mandated that misaligned pointers should be always accepted. Instead we have to deal the terrible middle ground.

          >atomic memory operations are made mandatory in Ziccamoa

          In other words, forget about potential performance advantages of load-link/store-conditional instructions. `compare_exchange` and `compare_exchange_weak` will always compile into the same instructions.

          And I guess you are fine with the page size part. I know there are huge-page-like proposals, but they do not resolve the fundamental issue.

          I have other minor performance-related nits such `seed` CSR being allowed to produce poor quality entropy which means that we have bring a whole CSPRNG if we want to generate a cryptographic key or nonce on a low-powered micro-controller.

          By no means I consider myself a RISC-V expert, if anything my familiarity with the ISA as a systems language programmer is quite shallow, but the number of accumulated disappointments even from such shallow familiarity has cooled my enthusiasm for RISC-V quite significantly.

          • By pseudohadamard 2026-03-119:481 reply

            RISC-V truly is the RyanAir of processors: Oh, you want FP maths? That's an optional extra, did you check that when you booked? And was that single or double-precision, all optional extras at an extra charge. Atomic instructions, that's an extra too, have your credit card details handy. Multiply and divide? Yeah, extras. Now, let me tell you about our high-end customer options, packed SIMD and user-level interrupts, only for business class users. And then there's our first-class benefits, hypervisor extensions for big spenders, and even more, all optional extras.

          • By IshKebab 2026-03-117:323 reply

            I think having separate unaligned load/store instructions would be a much worse design, not least because they use a lot of the opcode space. I don't understand why you don't just have an option to not generate misaligned loads for people that happen to be running on CPUs where it's really slow. You don't need to wait for a profile for that.

            As for `seed`, if you're running on a microcontroller you can just look up the data sheet to see if it's seed entropy is sufficient. By the time you get to CPUs where portable code is important a CSPRNG is probably fine.

            I agree about page size though. Svnapot seems overly complicated and gives only a fraction of the advantages of actually bigger pages.

      • By tosti 2026-03-114:593 reply

        Regarding misaligned reads, IIRC only x86 hides non-aligned memory access. It's still slower than aligned reads. Other processors just fault, so it would make sense to do the same on riscv.

        The problem is decades of software being written on a chip that from the outside appears not to care.

        • By torginus 2026-03-119:221 reply

          Yes, unaligned loads/stores are a niche feature that has huge implications in processor design - loads across cache-lines with different residency, pages that fault etc.

          This is the classic conundrum of legacy system redesign - if customers keep demanding every feature of the old system be present, and work the exact same then the new system will take on the baggage it was designed to get rid of.

          The new implementation will be slow and buggy by this standard and nobody will use it.

          • By 0x000xca0xfe 2026-03-1110:35

            Unaligned load/store is crucial for zero-copy handling of mmaped data, network streams and all other kinds of space-optimized data structures.

            If the CPU doesn't do it software must make many tiny conditional copies which is bad for branch prediction.

            This sucks double when you have variable length vector operations... IMO fast unaligned memory accesses should have been mandatory without exceptions for all application-level profiles and everything with vector.

        • By fredoralive 2026-03-118:41

          ARM Cortex-A cores also allow unaligned access (MCU cores don't though, and older ARM is weird). There's perhaps a hint if the two most popular CPU architectures have ended up in the forgiving approach to unaligned access, rather than the penalising approach of raising an interrupt.

        • By pjmlp 2026-03-116:441 reply

          On modern CPUs, it used not to be something to care about in the past across 8, 16, 32 bit generations, outside RISC.

          • By inkyoto 2026-03-116:591 reply

            PDP-11, m68k – to name a few, did not allow misaligned access to anything that was not a byte.

            Neither are RISC nor modern.

      • By adastra22 2026-03-1021:273 reply

        Also the bit manipulation extension wasn't part of the core. So things like bit rotation is slow for no good reason, if you want portable code. Why? Who knows.

        • By adgjlsfhk1 2026-03-1021:502 reply

          > Also the bit manipulation extension wasn't part of the core.

          This is primarily because core is primarily a teaching ISA. One of the best parts about RiscV is that you can teach a freshman level architecture class or a senior level chip building project with an ISA that is actually used. Anything powerful to run (a non built from source manually) linux will support a profile that bundles all the commonly needed instructions to be fast.

          • By jacquesm 2026-03-1021:593 reply

            Bit manipulation instructions are part and parcel of any curriculum that teaches CPU architecture. They are the basic building blocks for many more complex instructions.

            https://five-embeddev.com/riscv-bitmanip/1.0.0/bitmanip.html

            I can see quite a few items on that list that imnsho should have been included in the core and for the life of me I can't see the rationale behind leaving them out. Even the most basic 8 bit CPU had various shifts and rolls baked in.

          • By hackyhacky 2026-03-1021:572 reply

            > One of the best parts about RiscV is that you can teach a freshman level architecture class or a senior level chip building project with an ISA that is actually used.

            Same could be said of MIPS.

            My understanding is the RISC-V raison d'etre is rather avoidance of patented/copywritten designs.

        • By fidotron 2026-03-1021:311 reply

          The fact the Hazard3 designer ended up creating an extension to resolve related oddities was kind of astonishing.

          Why did it fall to them to do it? Impressive that he did, but it shouldn't have been necessary.

        • By mort96 2026-03-118:01

          Do you typically care about portability to the degree that you want the same machine code to execute on both a Linux box and a microcontroller? Why?

      • By torginus 2026-03-119:31

        Unaligned load/store is a horrible feature to implement.

        Page size can be easily extended down the line without breaking changes.

      • By direwolf20 2026-03-114:11

        The first one is common across many architectures, including ARM, and the second is just LLVM developers not understanding how cmpxchg works

    • By fidotron 2026-03-1021:155 reply

      > RISC-V will get there, eventually.

      Not trolling: I legitimately don't see why this is assumed to be true. It is one of those things that is true only once it has been achieved. Otherwise we would be able to create super high performance Sparc or SuperH processors, and we don't.

      As you note, Arm once was fast, then slow, then fast. RISC-V has never actually been fast. It has enabled surprisingly good implementations by small numbers of people, but competing at the high end (mobile, desktop or server) it is not.

      • By lizknope 2026-03-1022:442 reply

        I think the bigger question is does RISC-V need to be fast? Who wants to make it fast?

        I'm a chip designer and I see people using RISC-V as small processor cores for things like PCIE link training or various bookkeeping tasks. These don't need to be fast, they need to be small and low power which means they will be relatively slow.

        Most people on tech review sites only care about desktop / laptop / server performance. They may know about some of the ARM Cortex A series CPUs that have MMUs and can run desktop or smartphone Linux versions.

        They generally don't care about the ARM Cortex M or R versions for embedded and real time use. Those are the areas where you don't need high performance and where RISC-V is already replacing ARM.

        EDIT:

        I'll add that there are companies that COULD make a fast RISC-V implementation.

        Intel, AMD, Apple, Qualcomm, or Nvidia could redirect their existing teams to design a high performance RISC-V CPU. But why should they? They are heavily invested in their existing x86 and ARM CPU lines. Amazon and Google are using licensed ARM cores in their server CPUs.

        What is the incentive for any of them to make a high performance RISC-V CPU? The only reason I can think of is that Softbank keeps raising ARM licensing costs and it gets high enough that it is more profitable to hire a team and design your own RISC-V CPU.

        • By adgjlsfhk1 2026-03-1023:401 reply

          Of your list, Qualcomm and Nvidia are fairly likely to make high perf Riscv cpus. Qualcomm because Arm sued them to try and stop them from designing their own arm chips without paying a lot more money, and Nvidia because they already have a lot of teams making riscv chips, so it seems likely that they will try to unify on the one that doesn't require licensing.

          • By lizknope 2026-03-111:091 reply

            Yeah, they could but then what is the market? Qualcomm wants to sell smartphone chips and Android can run on RISC-V and most Android Java apps could in theory run.

            But if you look at the Intel x86 smartphone chips from about 10 years ago they had to make an ARM to x86 emulator because even the Java apps contained native ARM instructions for performance reasons.

            Qualcomm is trying to push their ARM Snapdragon chips in Windows laptops but I don't think they are selling well.

            Nvidia could also make RISC-V based chips but where would they go? Nvidia is moving further away from the consumer space to the data center space. So even if Nvidia made a really fast RISC-V CPU it would probably be for the server / data center market and they may not even sell it to ordinary consumers.

            Or if they did it could be like the Ampere ARM chips for servers. Yeah you can buy one as an ordinary consumer but they were in the $4,000 range last time I looked. How many people are going to buy that?

        • By benced 2026-03-114:44

          China is likely where it would come from - ARM and x86 are owned by Western companies.

      • By rwmj 2026-03-1021:213 reply

        RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots), largely because we learned from that. It's in fact a very "boring" architecture. There's no one that expects it'll be hard to optimize for. There are at least 2 designs that have taped out in small runs and have high end performance.

        • By adrian_b 2026-03-1021:572 reply

          RISC-V does not have the pitfalls of experimental ISAs from 45 years ago, but it has other pitfalls that have not existed in almost any ISA since the first vacuum-tube computers, like the lack of means for integer overflow detection and the lack of indexed addressing.

          Especially the lack of integer overflow detection is a choice of great stupidity, for which there exists no excuse.

          Detecting integer overflow in hardware is extremely cheap, its cost is absolutely negligible. On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably, because each arithmetic operation must be replaced by multiple operations.

          Because of the unacceptable cost, normal RISC-V programs choose to ignore the risk of overflows, which makes them unreliable.

          The highest performance implementations of RISC-V from previous years were forced to introduce custom extensions for indexed addressing, but those used inefficient encodings, because something like indexed addressing must be in the base ISA, not in an extension.

          • By hackyhacky 2026-03-1022:043 reply

            > On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably,

            Most languages don't care about integer overflow. Your typical C program will happily wrap around.

            If I really want to detect overflow, I can do this:

                add t0, a0, a1
                blt t0, a0, overflow
            
            Which is one more instruction, which is not great, not terrible.

          • By adgjlsfhk1 2026-03-1022:203 reply

            > On the other hand, detecting integer overflow in software is extremely expensive

            this just isn't true. both addition and multiplication can check for overflow in <2 instructions.

        • By classichasclass 2026-03-1022:051 reply

          As a counterexample, I point to another relatively boring RISC, PA-RISC. It took off not (just) because the architecture was straightforward, but because HP poured cash into making it quick, and PA-RISC continued to be a very competitive architecture until the mass insanity of Itanic arrived. I don't see RISC-V vendors making that level of investment, either because they won't (selling to cheap markets) or can't (no capacity or funding), and a cynical take would say they hide them behind NDAs so no one can look behind the curtain.

          I know this is a very negative take. I don't try to hide my pro-Power ISA bias, but that doesn't mean I wouldn't like another choice. So far, however, I've been repeatedly disappointed by RISC-V. It's always "five or six years" from getting there.

          • By adrian_b 2026-03-1023:352 reply

            I would not call PA-RISC boring. Already at launch there was no doubt that it is a better ISA than SPARC or MIPS, and later it was improved. At the time when PA-RISC 2.0 was replaced by Itanium it was not at all clear which of the 2 ISAs is better. The later failures to design high-performance Itanium CPUs make plausible that if HP would have kept PA-RISC 2.0 they might have had more competitive CPUs than with Itanium.

            SPARC (formerly called Berkeley RISC) and MIPS were pioneers that experimented with various features or lack of features, but they were inferior from many points of view to the earlier IBM 801.

            The RISC ISAs developed later, including ARM, HP PA-RISC and IBM POWER, have avoided some of the mistakes of SPARC and MIPS, while also taking some features from IBM 801 (e.g. its addressing modes), so they were better.

        • By fidotron 2026-03-1021:262 reply

          > RISC-V doesn't have the pitfalls of Sparc (register windows, branch delay slots),

          You're saying ISA design does have implementation performance implications then? ;)

          > There's no one that expects it'll be hard to optimize for

          [Raises hand]

          > There are at least 2 designs that have taped out in small runs and have high end performance.

          Are these public?

          Edit: I should add, I'm well aware of the cultural mismatch between HN and the semi industry, and have been caught in it more than a few times, but I also know the semi industry well enough to not trust anything they say. (Everything from well meaning but optimistic through to outright malicious depending on the company).

          • By rwmj 2026-03-1021:451 reply

            The 2 designs I'm thinking of are (tiresomely) under NDA, although I'm sure others will be able to say what they are. Last November I had a sample of one of them in my hand and played with the silicon at their labs, running a bunch of AI workloads. They didn't let me take notes or photographs.

            > There's no one that expects it'll be hard to optimize for

            No one who is an expert in the field, and we (at Red Hat) talk to them routinely.

          • By mastax 2026-03-113:55

            I assume the TensTorrent TT-Ascalon is one of the CPU designs.

      • By gt0 2026-03-1021:232 reply

        I don't think anybody suggests Oracle couldn't make faster SPARC processors, it's just that development of SPARC ended almost 10 years ago. At the time SPARC was abandoned, it was very competitive.

        • By twoodfin 2026-03-1023:261 reply

          In single-threaded performance? That’s not how I remember it: Sun was pushing parallel throughput over everything else, with designs like the T-Series & Rock.

          • By gt0 2026-03-110:291 reply

            Perhaps not single thread, but Rock was a dead end a while before Oracle pulled the plug, and Sun/Oracle's core market of course was always servers not workstations. We used Niagara machines at my work around the T2 era, a long time ago, but they were very competitive if you could saturate the cores and had the RAM to back it up.

        • By icedchai 2026-03-111:27

          Sparc stopped being competitive in the early 2000’s.

      • By Findecanor 2026-03-1023:391 reply

        Because today, getting a fast CPU out it isn't as much an engineering issue as it is about getting the investment for hiring a world-class fab.

        The most promising RISC-V companies today have not set out to compete directly with Intel, AMD, Apple or Samsung, but are targeting a niche such as AI, HPC and/or high-end embedded such as automotive.

        And you can bet that Qualcomm has RISC-V designs in-house, but only making ARM chips right now because ARM is where the market for smartphone and desktop SoCs is. Once Google starts allowing RVA23 on Android / ChromeOS, the flood gates will open.

        • By adgjlsfhk1 2026-03-1023:45

          It's very much both. You need millions of dollars for the fab, but you also need ~5 years to get 3 generations of cpus out (to fix all the performance bugs you find in the first two)

      • By snvzz 2026-03-111:32

        Fast, RVA23-compatible microarchitectures already exist. Everything high performance seems to be based on RVA23, which is the current application profile and comparable to ARMv9 and x86-64v4.

        However, it takes time from microarchitecture to chips, and from chips to products on shelves.

        The very first RVA23-compatible chips to show up will likely be the spacemiT K3 SoC, due in development boards April (i.e. next month).

        More of them, more performant, such as a development board with the Tenstorrent Ascalon CPU in the form of the Atlantis SoC, which was tapped out recently, are coming this summer.

        It is even possible such designs will show up in products aimed at the general public within the present year.

    • By rwmj 2026-03-1021:091 reply

      Marcin is working with us on RISC-V enablement for Fedora and RHEL, he's well aware of the problem with current implementations. We're hopeful that this'll be pretty much resolved by the end of the year.

      • By LeFantome 2026-03-1023:29

        If he expects it to be resolved by the end of the year (and I agree it likely will be), why is he writing a post like this?

        Is this because Fedora 44 is going to beta?

    • By Dwedit 2026-03-1021:31

      There's the ARM video from LowSpecGamer, where they talk about how they forgot to connect power to the chip, and it was still executing code anyway. According to Steve Furber, the chip was accidentally being powered from the protection diodes alone. So ARM was incredibly power efficient from the very beginning.

    • By cogman10 2026-03-1021:142 reply

      > AND the software with no architecture-specific optimisations

      The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.

      We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).

      Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.

      • By hrmtst93837 2026-03-1021:252 reply

        One thing compilers still struggle with is exploiting weird microarchitectural quirks or timing behaviors that aren't obvious from the ISA spec, especially with memory, cache and pipeline tuning. If a new RISC-V core doesn't expose the same prefetching tricks or has odd branch prediction you won't get parity just by porting the same backend. If you want peak numbers sometimes you do still need to tune libraries or even sprinkle in a bit of inline asm despite all the "let the compiler handle it" dogma.

        • By cogman10 2026-03-1021:433 reply

          While true, it's typically not going to be impactful on system performance.

          There's a reason, for example, why the linux distros all target a generic x86 architecture rather than a specific architecture.

        • By CyberDildonics 2026-03-111:29

          The things you are talking about are taken care of by out of order execution and the CPU itself being smart about how it executes. Putting in prefetch instructions rarely beats the actual prefetcher itself. Compilers didn't end up generating perfect pentium asm either. OOO execution is what changed the game in not needing perfect compiler output any more.

      • By bobmcnamara 2026-03-1021:311 reply

        > The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V.

        There's no carry bit, and no widening multiply(or MAC)

        • By Findecanor 2026-03-110:09

          RISC-V splits widening multiply out into two instructions: one for the high bits and one for the low. Just like 64-bit ARM does.

          Integer MAC doesn't exist, and is also hindered by a design decision not to require more than two source operands, so as to allow simple implementations to stay simple. The same reason also prevents RISC-V from having a true conditional move instruction: there is one but the second operand is hard-coded zero.

          FMAC exists, but only because it is in the IEEE 754 spec ... and it requires significant op-code space.

    • By bsder 2026-03-1022:29

      > Don't blame the ISA - blame the silicon implementations

      That's true, but tautological.

      The issue is that the RISC-V core is the easy part of the problem, and nobody seems to even be able to generate a chip that gets that right without weirdness and quirks.

      The more fundamental technical problem is that things like the cache organization and DDR interface and PCI interface and ... cannot just be synthesized. They require analog/RF VLSI designers doing things like clock forwarding and signal integrity analysis. If you get them wrong, your performance tanks, and, so far, everybody has gotten them wrong in various ways.

      The business problem is the fact that everybody wants to be the "performance" RISC-V vendor, but nobody wants to be the "embedded" RISC-V vendor. This is a problem because practically anybody who is willing to cough up for a "performance" processor is almost completely insensitive to any cost premium that ARM demands. The embedded space is hugely sensitive to cost, but nobody is willing to step into it because that requires that you do icky ecosystem things like marketing, software, debugging tools, inventory distribution, etc.

      This leads to the US business problem which is the fact that everybody wants to be an IP vendor and nobody wants to ship a damn chip. Consequently, if I want actual RISC-V hardware, I'm stuck dealing with Chinese vendors of various levels of dodginess.

    • By userbinator 2026-03-113:591 reply

      ARM was never a "speed demon"; it started out as a low power small-area core and clearly had more complexity and thought put into it than MIPS or RISC-V.

      Over a decade ago: https://news.ycombinator.com/item?id=8235120

      RISC-V will get there, eventually.

      Strong doubt. Those of us who were around in the 90s might remember how much hype there was with MIPS.

      • By rbanffy 2026-03-118:41

        I don’t think you remember, But the first Archimedes smoked the just-launched Compaq 386s with a dedicated 387 coprocessor.

        It was not designed to be one, but it ended up being surprisingly fast.

    • By api 2026-03-1020:385 reply

      A pattern I've noticed for a very long time:

      A lot of times the path to the highest performing CPU seems to be to optimize for power first, then speed, then repeat. That's because power and heat are a major design constraint that limits speed.

      I first noticed this way back with the Pentium 4 "Netburst" architecture vs. the smaller x86 cores that became the ancestor of the Core architecture. Intel eventually ran into a wall with P4 and then branched high performance cores off those lower-power ones and that's what gave us the venerable Core architecture that made Intel the dominant CPU maker for over a decade.

      ARM's history is another example.

      • By cpgxiii 2026-03-1021:341 reply

        I think the story is a bit more complicated. Core succeeded precisely because Intel had both the low-power experience with Pentium-M and the high-power experience with Netburst. The P4 architecture told them a lot about what was and wasn't viable and at what complexity. When you look at the successor generations from Core, what you see are a lot of more complex P4-like features being re-added, but with the benefits of improved microarch and fab processes. Obviously we will never know, but I don't think you would get to Haswell or Skylake in the form they were without the learning experience of the P4.

        In comparison, I think Arm is actually a very strong cautionary tale that focusing on power will not get you to performance. Arm processors remained pretty poor performance until designers from other CPU families entirely (PowerPC and Intel) took it on at Apple and basically dragged Arm to the performance level they are today.

        • By maximilianburke 2026-03-110:21

          And not just any PowerPC architects either, but the people from PA Semi. Motorola couldn't get the speed up and IBM couldn't get the power down.

      • By userbinator 2026-03-114:25

        NetBurst was supposed to be the application of RISC principles to x86 taken to its extreme (ultra-long pipelines to reduce clock-to-clock delay, highest clock speed possible --- basically reducing work-per-clock and hoping that reduces complexity enough to increase clock speed to compensate.) The ALU was 16 bits, "double pumped" with the carry split between the two, which lead to 32-bit ALU operations that don't carry between the lower and upper halves actually finishing a clock cycle faster than those with a carry.

        https://stackoverflow.com/questions/45066299/was-there-a-p4-...

      • By jnovek 2026-03-1020:512 reply

        I don’t have a micro architecture background so I apologize if this is obvious — What do power and speed mean in this context?

        • By McP 2026-03-1021:011 reply

          Power - how many Watts does it need? Speed - how quickly can it perform operations?

          • By wmf 2026-03-1022:161 reply

            You can get low power with a simple design at a low clock. This definitely will not help achieve high performance later.

        • By unethical_ban 2026-03-1021:13

          One could say "Optimize for efficiency first, then performance".

      • By cptskippy 2026-03-1021:24

        Core evolved from the Banis (Centrino) CPU core which was based on P3, not P4. Banias used the front-side bus from P4 but not the cores.

        Banias was hyper optimized for power, the mantra was to get done quickly and go to sleep to save power. Somewhere along the line someone said "hey what happens if we don't go to sleep?" and Core was born.

      • By jauntywundrkind 2026-03-1021:21

        Parallels to code design, where optimizing data or code size can end up having fantastic performance benefits (sometimes).

    • By dmitrygr 2026-03-1020:244 reply

      IF you care to read the article, they indeed do not blame the architecture but the available silicon implementations.

      • By rbanffy 2026-03-1020:264 reply

        I did read it. A Banana Pi is not the fastest developer platform. The title is misleading.

        BTW, it's quite impressive how the s390x is so fast per core compared to the others. I mean, of course it's fast - we all knew that.

        And don't let IBM legal see this can be considered a published benchmark, because they are very shy about s390x performance numbers.

        • By Aurornis 2026-03-1021:112 reply

          > A Banana Pi is not the fastest developer platform.

          What is the current fastest platform that isn’t exorbitantly expensive? Not upcoming releases, but something I can actually buy.

          I check in every 3-6 months but the situation hasn’t changed significantly yet.

        • By gt0 2026-03-1020:323 reply

          I was really surprised by the s390x performance, but I also don't really understand why there are build time listed by architecture, not the actual processors.

        • By menaerus 2026-03-1020:303 reply

          Which risc-v implementation is considered fast?

        • By snvzz 2026-03-111:42

          >I did read it. A Banana Pi is not the fastest developer platform. The title is misleading.

          Ironically, its SoC (spacemiT K1) is slower than the JH7110 used in the first mass-produced RISC-V SBC, VisionFive 2.

          But unlike JH7110, it has vector 1.0, making it a very popular target.

          Of course, none of these pre-RVA23 boards will be relevant anymore, once the first development boards with RVA23-compatible K3 ship next month.

          These are also much faster than anything RISC-V currently purchasable. Developers have been playing with them for months through ssh access.

      • By topspin 2026-03-1020:282 reply

        I keep checking in on Tenstorrent every few months thinking Keller is going to rock our world... losing hope.

        At this point the most likely place for truly competitive RISC-V to appear is China.

        • By Findecanor 2026-03-110:152 reply

          Tenstorrent is supposedly taping out 8-wide Ascalon processors as we speak, with devboards projected to be available in Q2/Q3 this year.

          BTW. Keller is also on the board of AheadComputing — founded by former Intel engineers behind the fabled "Royal Core".

          • By topspin 2026-03-112:271 reply

            I can't know what Ascalon will actually be, but back in April/May 2025 there were actual performance numbers presented by Tenstorrent, and I analyzed what was shown. I concluded that Ascalon would be the x86_64 equivalent of an i5-9600K.

            That's useable for many applications, but it's not going to change the world. A lot of "micro PCs" with low power CPUs are well past that now. If that's what Ascalon turns out to be, it will amount to an SBC class device.

        • By rbanffy 2026-03-1020:302 reply

          > At this point the most likely place for fast RISC-V to appear is China.

          Or we just adopt Loongson.

      • By tromp 2026-03-1020:27

        But they didn't reflect that in a title like "current RISC-V silicon Is Sloooow" ...

      • By spiderice 2026-03-1020:35

        Then how do you justify the title?

    • By crest 2026-03-1023:09

      RISC-V lacks a bunch of really useful relatively easy to implement instructions and most extensions are truly optional so you can't rely on them. That's the problem if you let a bunch of academics turn your ISA into a paper mill.

      In theory you can spend a lot of effort to make a flawed ISA perform, but it will be neither easy nor pretty e.g. real world Linux distros can't distribute optimised packages for every uarch from dual-issue in-order RV64GC to 8-wide OoO RV64 with all the bells and whistles. Only in (deeply) embedded systems can you retarget the toolchain and optimise for each damn architecture subset you encounter.

  • By kashyapc 2026-03-110:331 reply

    Arm had 40 years to be where it is today. RISC-V is 15 years old. Some more patience is warranted.

    Assuming they will keep their word, later this year Tenstorrent is supposed to ship their RVA23-based server development platform[1]. They announced[2] it at the last year's NA RISC-V Summit. Let's see.

    The ball is in the court of hardware vendors to cook some high-end silicon.

    [1] https://tenstorrent.com/ip/risc-v-cpu

    [2] https://static.sched.com/hosted_files/riscvsummit2025/e2/Unl...

    • By userbinator 2026-03-113:451 reply

      MIPS, which RISC-V is closely modeled after, is also roughly 4 decades old and was massively hyped in the early 90s as well.

      • By kashyapc 2026-03-1111:20

        Great point; I only know about MIPS legacy vaguely. As you imply, don't listen to the "hypsters" but pay attention to what silicon is being produced :)

HackerNews