Racing karts on a Rust GPU kernel driver

2025-11-1920:237714www.collabora.com

The Tyr prototype has progressed from basic GPU job execution to running GNOME, Weston, and full-screen 3D games like SuperTuxKart

A few months ago, we introduced Tyr, a Rust driver for Arm Mali GPUs that continues to see active development upstream and downstream. As the upstream code awaits broader ecosystem readiness, we have focused on a downstream prototype that will serve as a baseline for community benchmarking and help guide our upstreaming efforts.

Today, we are excited to share that the Tyr prototype has progressed from basic GPU job execution to running GNOME, Weston, and full-screen 3D games like SuperTuxKart, demonstrating a functional, high-performance Rust driver that matches C-driver performance and paves the way for eventual upstream integration!

GNOME on Tyr

Setting the stage

I previously discussed the relationship between user-mode drivers (UMDs) and kernel-mode drivers (KMDs) in one of my posts about how GPUs work. Here's a quick recap to help get you up to speed:

One thing to be understood from the previous section is that the majority of the complexity tends to reside at the UMD level. This component is in charge of translating the higher-level API commands into lower-level commands that the GPU can understand. Nevertheless the KMD is responsible for providing key operations such that its user-mode driver is actually implementable, and it must do so in a way that fairly shares the underlying GPU hardware among multiple tasks in the system.

While the UMD will take care of translating from APIs like Vulkan or OpenGL into GPU-specific commands, the KMD must bring the GPU hardware to a state where it can accept requests before it can share the device fairly among the UMDs in the system. This covers power management, parsing and loading the firmware, as well as giving the UMD a way to allocate GPU memory while ensuring isolation between different GPU contexts for security.

This was our initial focus for quite a few months while working on Tyr, and testing was mainly done through the IGT framework. These tests would mainly consist of performing simple ioctls() against the driver and subsequently checking whether the results made sense.

By the way, those willing to further understand the relationship between UMDs and KMDs on Linux should watch a talk given at Kernel Recipes by my colleague Boris Brezillon on the topic!

Submitting a single job

Once the GPU is ready to accept requests and userspace can allocate GPU memory as needed, the UMD can place all the resources required by a given workload in GPU buffers. These can be further referenced by the command buffers containing the instructions to be executed, as we explain in the excerpt below:

With the data describing the model and the machine code describing the shaders, the UMD must ask the KMD to place this in GPU memory prior to execution. It must also tell the GPU that it wants to carry out a draw call and set any state needed to make this happen, which it does by means of building VkCommandBuffers, which are structures containing instructions to be carried out by the GPU in order to make the workload happen. It also needs to set up a way to be notified when the workload is done and then allocate the memory to place the results in.

In this sense, the KMD is the last link between the UMD and the GPU hardware, providing the necessary APIs for job submission and synchronization. It ensures that all the drawing operations built at the userspace level can actually reach the GPU for execution. It is the KMD's responsibility to ensure that jobs only get scheduled once its dependencies have finished executing. It also has to notify (in other words, signal to) the UMD when jobs are done, or the UMD won't really know when the results are valid.

Additionally, before Tyr can execute a complex workload consisting of a vast amount of simultaneous jobs, it must be able to execute a simple one correctly, or debugging will be an unfruitful nightmare. For this matter, we devised the simplest job we could think of: one that merely places a single integer in a given memory location using a MOV instruction on the GPU. Our IGT test then blocks until the KMD signals that the work was carried out.

Reading that memory location and ensuring that its contents match the constant we were expecting shows that the test was executed successfully. In other words, it shows that we were able to place the instructions in one of the GPU's ring buffers and have the hardware iterator pick it up and execute correctly, paving the way for more complex tests that can actually try to draw something.

The test source code for this dummy job is here.

Drawing a rotating cube

With job submission and signalling working, it was time to attempt to render a scene. We chose kmscube, which draws a single rotating cube on the screen, as the next milestone.

It was a good candidate owing to its simple geometry and the fact that it is completely self-contained. In other words, no compositor is needed and rendering takes place in a buffer that's directly handed to the display (KMS) driver.

Getting kmscube to run would also prove that we were really enforcing the job dependencies that were set by the UMD or we would get visual glitches. To do so, we relied on a slightly updated version of the Rust abstractions for the DRM scheduler posted by Asahi Lina a few years ago. The result was a rotating cube that was rendered at the display's refresh rate.

kmscube on Tyr

Using offscreen rendering lets us go even faster, jumping from 30 or 60fps to more than 500 frames per second, matching the performance of the C driver. That's a lot of frames being drawn!

Can it render the whole UI?

The natural progression would be to launch Weston or GNOME. As there is quite a lot going on when a DE like GNOME is running; we were almost expecting it not to work at first, so it came as a huge surprise when GNOME's login page was rendered.

In fact, you can log in to GNOME, open Firefox, and...watch a YouTube video:

YouTube on GNOME on Tyr

Running vkcube under weston also just works!

vk cube on Weston on Tyr

Can it render a game?

The last 3D milestone is running a game or another 3D-intensive application. Not only would that put the GPU through a demanding workload, but it would also allow us to gauge the KMD's performance more accurately. Again, the game is rendered correctly and is completely playable, without any noticeable hiccups or other performance issues, so long as it is run on full screen. Unfortunately, windowed mode still has some glitches: it is a prototype, after all.

Supertuxkart on Tyr

Why is this important?

It's important to clarify what this means and how this plays into the long-term vision for the project.

In fact, it's easier to start by what we are not claiming with this post: Tyr is not ready to be used as a daily-driver, and it will still take time to replicate this upstream, although it is now clear that we will surely get there. And as a mere prototype, it has a lot of shortcuts that we would not have in an upstream version, even though it can run on top of an unmodified (i.e., upstream) version of Mesa.

That said, this prototype can serve as an experimental driver and as a testbed for all the Rust abstraction work taking place upstream. It will let us experiment with different design decisions and gather data on what truly contributes to the project's objective. It is a testament that Rust GPU KMDs can work, and not only that, but they can perform on par with their C counterparts.

Needless to say, we cannot make any assumptions about stability on an experimental driver, it might very well lock up and lose your work after some time, so be aware.

Finally, this was tested on a Rock 5B board, which is fitted with a Rockchip RK3588 system-on-chip and it will probably not work for any other device at the moment. Those with this hardware at hand should feel free to test our branch and provide feedback. The source code can be found here. Make sure to enable CONFIG_TYR_DRM_DEPS and CONFIG_DRM_TYR. Feel free to contribute to Tyr by checking out our issue board!

Below is a video showcasing the Tyr prototype in action. Enjoy!


Read the original article

Comments

  • By HeyMeco 2025-11-1922:031 reply

    Great accomplishment from the developers. Being announced in July and already running gnome and games. 2026 is going to become very interesting

    • By reckoning 2025-11-200:252 reply

      Would you be able to elaborate further on the implications? What sort of devices and use cases would benefit from this work?

      Looks like this is ARM specific, and is a layer between more 'traditional' APIs, ala vulkan and opengl, and the device's gpu.

      Would this work provide speed ups or is it more for compatibility?

      • By bonzini 2025-11-200:401 reply

        You can read more about the organization of the work and how it's split between kernel and user space at https://www.collabora.com/news-and-blog/blog/2025/08/06/writ....

        • By xyzsparetimexyz 2025-11-2011:271 reply

          This didn't really answer the question. _Who_ benefits from these drivers

          • By webdevver 2025-11-2011:341 reply

            having open source gpu runtime, from api to metal, would be nice. but as you can see, the real meat of the business (the compiler) will probably never-ever be open sourced for internal political reasons. which is the most interesting component.

            it must be said the gpu crowd is very different to the cpu crowd. the cpu crowd trips over themselves to tell everyone about their ISA. the gpu crowd keep it very close to their chest. (the gpu isas are also quite batshit insane but thats another conversation.) you wind up with almost polar opposite experiences in terms of how these two groups interact with the broader industry.

            gpu people reeeaally don't want you prying your nose beyond the userspace APIs, in my experience.

            EDIT - to add though... that is kind of understandable, because the gpu crowd is under a lot more pressure to support basically everything and everyone. opengl, dxil, spirv, opencl - and the dense matrix of combinations. I often see people hate on Apple for developing their own API (Metal), but honestly? I totally get it. in retrospect they probably did the right thing.

            we have an epidemic of gpu programming "specs" and "standards", with no end in sight. i can't help but keep editing this comment with more of my hot takes: ofcourse nvidia owns the GPU programming space. they're the only ones offering anything resembling an actual platform! and they will continue dominating the space, because the rest of the players are too busy playing in their own ballpens.

            I think the only way to dislodge nvidia is to ape their silicon. a USSR needs to come along, steal their masks, and just make carbon copies. not a shim, just straight up copy nvidia 100%. like, you run `nvidia-smi` and it reports back "yep this is a real nvidia card". and then sell it for half the price. it would be totally illegal, but when youre a state actor You Can Just Do Things.

            • By xyzsparetimexyz 2025-11-2012:341 reply

              That's all orthogonal to my question. I don't understand where the arm mali drivers are used. Are they used in Android phones? In laptops?

              • By zamadatix 2025-11-2014:081 reply

                Anything with an ARM Mali family GPU, same as you'd find for the question "where are Nvidia drivers used" as being "anywhere an Nvidia GPU is used". There isn't a premade list of certain people/companies who might ever used a certain brand GPU in their products, it's "just about anywhere". That can be anything including phones, tablets, mini PCs, laptops, SBCs, TV boxes, VR headsets, and so on - it's not limited to use in a specific product/manufacturer/type of device only.

                If you're looking for something hackable to play with the Mali driver options on yourself, Chromebooks or SBCs (like the one in the article) are usually the easiest bet and where the development is done vs the more fixed-by-manufacturer type devices like the typical phone where you get what they decide to package (which may or may not be the particular open driver you're looking to see used).

                • By xyzsparetimexyz 2025-11-2017:251 reply

                  So Chromebooks and TVs. Got it.

                  • By zamadatix 2025-11-2017:40

                    Hmm, not really. As mentioned above, anything including phones, tablets, mini PCs, laptops, SBCs, TV boxes, VR headsets, and so on. Chromebooks and TVs would just be 2 examples of these types of devices.

                    As a solid example, the screenshots from the article are not taken from a Chromebook or TV :).

      • By throawayonthe 2025-11-2115:43

        this 'layer' exists for all linux gpu drivers, and this one is specific to certain Arm Mali gpus

        ie radv/radeonsi would be the analogue for amd, honeykrisp/asahi for apple, freedreno for qualcomm etc

  • By xyzsparetimexyz 2025-11-208:511 reply

    Who even uses these arm mali GPUs?

    • By pjmlp 2025-11-2010:041 reply

      Android phones most likely.

      • By xyzsparetimexyz 2025-11-2011:251 reply

        Do they use Linux drivers? You'd think that the device manufacturers would supply those

        • By pjmlp 2025-11-2012:23

          The drivers are either Linux drivers, or Treble drivers, anyway the question was about who uses these GPUs.

HackerNews