
14 Feb 2024For years, the M1 has only supported OpenGL 4.1. That changes today – with our release of full OpenGL® 4.6 and OpenGL® ES 3.2! Install Fedora for the latest M1/M2-series drivers. Already…
14 Feb 2024
For years, the M1 has only supported OpenGL 4.1. That changes today – with our release of full OpenGL® 4.6 and OpenGL® ES 3.2! Install Fedora for the latest M1/M2-series drivers.
Already installed? Just dnf upgrade --refresh.
Unlike the vendor’s non-conformant 4.1 drivers, our open source Linux drivers are conformant to the latest OpenGL versions, finally promising broad compatibility with modern OpenGL workloads, like Blender, Ryujinx, and Citra.
Conformant 4.6/3.2 drivers must pass over 100,000 tests to ensure correctness. The official list of conformant drivers now includes our OpenGL 4.6 and ES 3.2.
While the vendor doesn’t yet support graphics standards like modern OpenGL, we do. For this Valentine’s Day, we want to profess our love for interoperable open standards. We want to free users and developers from lock-in, enabling applications to run anywhere the heart wants without special ports. For that, we need standards conformance. Six months ago, we became the first conformant driver for any standard graphics API for the M1 with the release of OpenGL ES 3.1 drivers. Today, we’ve finished OpenGL with the full 4.6… and we’re well on the road to Vulkan.
Compared to 4.1, OpenGL 4.6 adds dozens of required features, including:
Regrettably, the M1 doesn’t map well to any graphics standard newer than OpenGL ES 3.1. While Vulkan makes some of these features optional, the missing features are required to layer DirectX and OpenGL on top. No existing solution on M1 gets past the OpenGL 4.1 feature set.
How do we break the 4.1 barrier? Without hardware support, new features need new tricks. Geometry shaders, tessellation, and transform feedback become compute shaders. Cull distance becomes a transformed interpolated value. Clip control becomes a vertex shader epilogue. The list goes on.
For a taste of the challenges we overcame, let’s look at robustness.
Built for gaming, GPUs traditionally prioritize raw performance over safety. Invalid application code, like a shader that reads a buffer out-of-bounds, can trigger undefined behaviour. Drivers exploit that to maximize performance.
For applications like web browsers, that trade-off is undesirable. Browsers handle untrusted shaders, which they must sanitize to ensure stability and security. Clicking a malicious link should not crash the browser. While some sanitization is necessary as graphics APIs are not security barriers, reducing undefined behaviour in the API can assist “defence in depth”.
“Robustness” features can help. Without robustness, out-of-bounds buffer access in a shader can crash. With robustness, the application can opt for defined out-of-bounds behaviour, trading some performance for less attack surface.
All modern cross-vendor APIs include robustness. Many games even (accidentally?) rely on robustness. Strangely, the vendor’s proprietary API omits buffer robustness. We must do better for conformance, correctness, and compatibility.
Let’s first define the problem. Different APIs have different definitions of what an out-of-bounds load returns when robustness is enabled:
robustBufferAccess2)robustBufferAccess)OpenGL uses the second definition: return zero or data from the buffer. One approach is to return the last element of the buffer for out-of-bounds access. Given the buffer size, we can calculate the last index. Now consider the minimum of the index being accessed and the last index. That equals the index being accessed if it is valid, and some other valid index otherwise. Loading the minimum index is safe and gives a spec-compliant result.
As an example, a uniform buffer load without robustness might look like:
load.i32 result, buffer, indexRobustness adds a single unsigned minimum (umin) instruction:
umin idx, index, last
load.i32 result, buffer, idxIs the robust version slower? It can be. The difference should be small percentage-wise, as arithmetic is faster than memory. With thousands of threads running in parallel, the arithmetic cost may even be hidden by the load’s latency.
There’s another trick that speeds up robust uniform buffers. Like other GPUs, the M1 supports “preambles”. The idea is simple: instead of calculating the same value in every thread, it’s faster to calculate once and reuse the result. The compiler identifies eligible calculations and moves them to a preamble executed before the main shader. These redundancies are common, so preambles provide a nice speed-up.
We usually move uniform buffer loads to the preamble when every thread loads the same index. Since the size of a uniform buffer is fixed, extra robustness arithmetic is also moved to the preamble. The robustness is “free” for the main shader. For robust storage buffers, the clamping might move to the preamble even if the load or store cannot.
Armed with robust uniform and storage buffers, let’s consider robust “vertex buffers”. In graphics APIs, the application can set vertex buffers with a base GPU address and a chosen layout of “attributes” within each buffer. Each attribute has an offset and a format, and the buffer has a “stride” indicating the number of bytes per vertex. The vertex shader can then read attributes, implicitly indexing by the vertex. To do so, the shader loads the address:
Some hardware implements robust vertex fetch natively. Other hardware has bounds-checked buffers to accelerate robust software vertex fetch. Unfortunately, the M1 has neither. We need to implement vertex fetch with raw memory loads.
One instruction set feature helps. In addition to a 64-bit base address, the M1 GPU’s memory loads also take an offset in elements. The hardware shifts the offset and adds to the 64-bit base to determine the address to fetch. Additionally, the M1 has a combined integer multiply-add instruction imad. Together, these features let us implement vertex loads in two instructions. For example, a 32-bit attribute load looks like:
imad idx, stride/4, vertex, offset/4
load.i32 result, base, idxThe hardware load can perform an additional small shift. Suppose our attribute is a vector of 4 32-bit values, densely packed into a buffer with no offset. We can load that attribute in one instruction:
load.v4i32 result, base, vertex << 2…with the hardware calculating the address:
What about robustness?
We want to implement robustness with a clamp, like we did for uniform buffers. The problem is that the vertex buffer size is given in bytes, while our optimized load takes an index in “vertices”. A single vertex buffer can contain multiple attributes with different formats and offsets, so we can’t convert the size in bytes to a size in “vertices”.
Let’s handle the latter problem. We can rewrite the addressing equation as:
That is: one buffer with many attributes at different offsets is equivalent to many buffers with one attribute and no offset. This gives an alternate perspective on the same data layout. Is this an improvement? It avoids an addition in the shader, at the cost of passing more data – addresses are 64-bit while attribute offsets are 16-bit. More importantly, it lets us translate the vertex buffer size in bytes into a size in “vertices” for each vertex attribute. Instead of clamping the offset, we clamp the vertex index. We still make full use of the hardware addressing modes, now with robustness:
umin idx, vertex, last valid
load.v4i32 result, base, idx << 2We need to calculate the last valid vertex index ahead-of-time for each attribute. Each attribute has a format with a particular size. Manipulating the addressing equation, we can calculate the last byte accessed in the buffer (plus 1) relative to the base:
The load is valid when that value is bounded by the buffer size in bytes. We solve the integer inequality as:
The driver calculates the right-hand side and passes it into the shader.
One last problem: what if a buffer is too small to load anything? Clamping won’t save us – the code would clamp to a negative index. In that case, the attribute is entirely invalid, so we swap the application’s buffer for a small buffer of zeroes. Since we gave each attribute its own base address, this determination is per-attribute. Then clamping the index to zero correctly loads zeroes.
Putting it together, a little driver math gives us robust buffers at the cost of one umin instruction.
In addition to buffer robustness, we need image robustness. Like its buffer counterpart, image robustness requires that out-of-bounds image loads return zero. That formalizes a guarantee that reasonable hardware already makes.
…But it would be no fun if our hardware was reasonable.
Running the conformance tests for image robustness, there is a single test failure affecting “mipmapping”.
For background, mipmapped images contain multiple “levels of detail”. The base level is the original image; each successive level is the previous level downscaled. When rendering, the hardware selects the level closest to matching the on-screen size, improving efficiency and visual quality.
With robustness, the specifications all agree that image loads return…
Meanwhile, image loads on the M1 GPU return…
Uh-oh. Rather than returning zero for out-of-bounds levels, the hardware clamps the level and returns nonzero values. It’s a mystery why. The vendor does not document their hardware publicly, forcing us to rely on reverse engineering to build drivers. Without documentation, we don’t know if this behaviour is intentional or a hardware bug. Either way, we need a workaround to pass conformance.
The obvious workaround is to never load from an invalid level:
if (level <= levels) {
return imageLoad(x, y, level);
} else {
return 0;
}That involves branching, which is inefficient. Loading an out-of-bounds level doesn’t crash, so we can speculatively load and then use a compare-and-select operation instead of branching:
vec4 data = imageLoad(x, y, level);
return (level <= levels) ? data : 0;This workaround is okay, but it could be improved. While the M1 GPU has combined compare-and-select instructions, the instruction set is scalar. Each thread processes one value at a time, not a vector of multiple values. However, image loads return a vector of four components (red, green, blue, alpha). While the pseudo-code looks efficient, the resulting assembly is not:
image_load R, x, y, level
ulesel R[0], level, levels, R[0], 0
ulesel R[1], level, levels, R[1], 0
ulesel R[2], level, levels, R[2], 0
ulesel R[3], level, levels, R[3], 0Fortunately, the vendor driver has a trick. We know the hardware returns zero if either X or Y is out-of-bounds, so we can force a zero output by setting X or Y out-of-bounds. As the maximum image size is 16384 pixels wide, any X greater than 16384 is out-of-bounds. That justifies an alternate workaround:
bool valid = (level <= levels);
int x_ = valid ? x : 20000;
return imageLoad(x_, y, level);Why is this better? We only change a single scalar, not a whole vector, compiling to compact scalar assembly:
ulesel x_, level, levels, x, #20000
image_load R, x_, y, levelIf we preload the constant to a uniform register, the workaround is a single instruction. That’s optimal – and it passes conformance.
Blender “Wanderer” demo by Daniel Bystedt, licensed CC BY-SA.
Alyssa Rosenzweig is a gift to the community that keeps on giving. Every one of her blog posts is a guarantee to learn something you didn't know about the internals of modern graphics hardware.
This endeavour proofs to me skills beat talkativeness every single day. Just reading the blogs sets my brain on fire. There is so much to unpack. The punch line is not the last but the second sentence, nevertheless you're forced to follow the path into the rabbit hole until you enjoy reading one bit manipulation after the other.
If there ever are benchmarks with eureka effects per paragraph Alyssa will lead them all.
Just thanks!
One day, Apple will deprecate opengl 3.3 core, and I guess everybody might end up deprecating it.
I've read that generally opengl is just easier to use than vulkan, I don't know if that's true, but if something is too complicated, it becomes just too hard for less experienced devs to exploit those GPU, and it becomes a barrier to entry, which might discourage some indie game developers.
Although everyone uses unity and unreal now, baking things from scratch or using other engines is just weird now, for some reason. It's really annoying, and it's fun to see gamedev wake up after unity tried to lock things more.
Open source in gaming has always been stretched thin. Godot is there, but I doubt it's able to seriously compete with unity and unreal even if I want it to, so even if godot is capable, indie gamedevs are more experienced with unity and unreal and will stick to those.
The state of open source in game dev feels really hopeless sometimes, the rise of next gen graphics API are not making things easy.
> I've read that generally opengl is just easier to use than vulkan
[here's](https://learnopengl.com/code_viewer_gh.php?code=src/1.gettin...) an opengl triangle rendering example code (~200 LOC)
[here's](https://vulkan-tutorial.com/code/17_swap_chain_recreation.cp...) a vulkan triangle rendering example code (~1000 LOC)
ye it's fair to say opengl is a bit easier to use ijbol
You’re getting downvoted for some reason, but OpenGL is absolutely easier. It abstracts so much (and for beginners there’s still a ton even with all the abstraction!). No need to think about how to prep pipelines, optimally upload your data, manually synchronize your rendering, and more with OpenGL, unlike Vulkan. The low level nature of Vulkan allows you to eek out every bit of performance, but for indie game developers and the majority of graphics development that doesn’t depend on realtime PBR with giant amounts of data, OpenGL is still immensely useful.
If anything, an OpenGL-like API will naturally be developed on top of Vulkan for the users that don’t care about all that stuff. And once again, I can’t stress this enough, OpenGL is still a lot for beginners. Shaders, geometric transformations, the fixed function pipeline, vertex layouts, shader buffer objects, textures, mip maps, instancing, buffers in general, there’s sooo much to learn and these foundations transcend OpenGL and apply to all graphics rendering. As a beginner, OpenGL allowing me to focus on the higher level details was immensely beneficial for me getting started on my graphics programming journey.
It won't be OpenGL-like, it will probably just be OpenGL https://docs.mesa3d.org/drivers/zink.html
This is a bit misleading. Much of the extra code that you'd have to write in Vulkan to get to first-triangle is just that, a one-time cost. And you can use a third-party library, framework or engine to take care of it. Vulkan merely splits out the hardware-native low level from the library support layer, that were conflated in OpenGL, and lets the latter evolve freely via a third party ecosystem. That's just a sensible choice.
And often those LOC examples use GLFW or some other library to load OpenGL. Loading a Vulkan instance is a walk in the park compared to initializing an OpenGL context, especially on Windows. It's incredibly misleading. If you allowed utility libraries for Vulkan to compare LOC-to-triangle Vulkan would be much closer to OpenGL.
It depends on the operating system. On macOS and iOS it was always just a few lines of code to setup a GL context. On Windows via WGL and Linux via GLX it's a nightmare though. Linux with EGL is also okay-ish.
I mean you're literally suggesting that people should use a third-party framework/engine/library because writing all the Vulkan boiler plate yourself is too hard.
Drawing conclusions from a hello world example is not representative of which API is "easier". You are also using lines of code as measure of "ease" where it's a measure of "verbosity".
Further, the OpenGL example is not following modern graphics best practices and relies on defaults from OpenGL which cuts down the lines of code but is not practical in real applications.
Getting Vulkan initialized is a bit of a chore, but once it's set up, it's not much more difficult than OpenGL. GPU programming is hard no matter which way you put it.
I'm not claiming Vulkan initialization is not verbose, it certainly is, but there are libraries to help you with that (f.ex. vkbootstrap, vma, etc). The init routine requires you to explicitly state which HW and SW features you need, reducing the "it works on my computer" problem that plagues OpenGL.
If you use a recent Vulkan version (1.2+), namely the dynamic rendering and dynamic state features, it's actually very close to OpenGL because you don't need to configure render passes, framebuffers etc. This greatly reduces the amount of code needed to draw stuff. All of this is available on all desktop platforms, even on quite old hardware (~10 year old gpus) if your drivers are up to date. The only major difference is the need for explicit pipeline barriers.
Just to give you a point of reference, drawing a triangle with Vulkan, with the reusable framework excluded, is 122 lines of Rust code including the GLSL shader sources.
Another data point from my past projects, a practical setup for OpenGL is about 1500 lines of code, where Vulkan is perhaps 3000-4000 LOC where ~1000 LOC is trivial setup code for enabled features (verbose, but not hard).
As a graphics programmer, going from OpenGL to Vulkan has been a massive quality of life improvement.
I also am a graphics programmer and lead Vulkan developer at our company. I love Vulkan. I wouldn’t touch OpenGL with a 10 foot pole. But I also have years of domain expertise and OpenGL is hands down the better beginner choice.
The Vulkan hello triangle is terrible, it’s not at all production level code. Yeah, neither is the OpenGL one, but that’s much closer. Getting Vulkan right requires quite a bit of understanding of the underlying hardware. There’s very little to no hand holding, even with the validation layers in place it’s easy to screw up barriers, resource transitions and memory management.
Vulkan is fantastic for people with experience and a good grasp of the underlying concepts, like you and me. It’s awful for beginners who are new to graphics programmers.
I've used OpenGL for over 20 years and Vulkan since it came out. Neither of them is easy, but OpenGL's complexity and awkward stateful programming model is quite horrific.
I've also watched and helped beginners struggling with both APIs on many internet forums over the years, and while getting that first triangle is easier in OpenGL, the curve gets a lot steeper right after that. Things like managing vertex array objects (VAO) and framebuffer objects (FBO) can be really confusing and they are kind of retrofitted to the API in the first place.
I actually think that beginners shouldn't be using either of them and understand the basics of 3d graphics in a friendlier environment like Godot or Unity or something.
Vulkan 1.3 makes graphics programming fun again. Now you don't need to build render passes and pipeline states up front, it's really easy to just set the pipeline states ad-hoc and fire off your draw calls.
But yeah, judging by the downvotes my GP comment is receiving, seems like a lot of readers disagree. I'm not sure how many of them have actually used both APIs beyond beginner level, but I don't know anyone who has used both professionally and wants to go back to OpenGL with its awkward API and GLSL compiler bugs and whatnot.
I'm mostly with you, and for the record have both used OpenGL and Vulkan professionally in shipped titles. I personally have no interest in going back to OpenGL.
But I think the disconnect is that neither you nor me have any reason to fear Vulkan. I love how explicit Vulkan is and that the spec is in such depth, the only thing that comes close imho are the proprietary console APIs. I've also worked with Metal and DirectX and the documentation for those is just bad, you kinda have to know what you are doing already to understand the scraps of information you get and reason about the unspoken implementation details.
That all being said though, the Vulkan spec and setup is just daunting for a beginner. And yeah, you can take a lot of shortcuts like not pre-building your PSOs or being lax with your barriers and memory handling. But I feel like you might as well just not use Vulkan in that case since you are throwing away some of its biggest advantages. Just AZDO it up and take advantage of the IHVs having spent decades beating on their OpenGL implementation (as long as we don't count the red team). A lot of Vulkan is just boilerplate and easy to-do and abstract away, but doing Vulkan right, that's the hard part in my opinion.
Not to mention that it's really easy to build insidious gotchas into your code that are really hard to spot. Nvidia famously just doesn't care about image layouts, so if you just develop on Nvidia hardware it can be pretty easy to write code runs as if it was correct, but will explode on hardware that does care about image layouts. I have a drawer full of GPUs from different IHVs and generations just for day to day development. That's a really high barrier to entry.
FWIW Metal is actually easier to use than Vulkan in my opinion, as Vulkan is kind of designed to be super flexible and doesn't have as much niceties in it. Either way, OpenGL was simply too high level to be exposed as the direct API of the drivers. It's much better to have a lower level API like Vulkan as the base layer, and then build something like OpenGL on top of Vulkan instead. It maps much better to how GPU hardware works this way. There's a reason why we have a concept of software layers.
It's also not quite true that everyone uses Unity and Unreal. Just look at the Game of the Year nominees from The Game Award 2023. All 6 of them were built using in-house game engines. Among indies there are also still some amount of developers who develop their own engines (e.g. Hades), but it's true that the majority of them will just use an off-the-shelf one.
Metal is probably the most streamlined and easiest to use GPU API right now. It's compact, adapts to your needs, and can be intuitively understood by anyone with basic C++ knowledge.
OpenGL is not deprecated, it is simpler and continues to be used where Vulkan is overkill. Using it for greenfields is a good choice if it covers all your needs (and if you don't mind the stateful render pipeline).
It kind of is, OpenGL 4.6 is the very last version, the red book only covers until OpenGL 4.5, and some hardware vendors are now shipping OpenGL on top of Vulkan or DirectX, instead of providing native OpenGL drivers.
While not officially deprecated, it is standing still and won't get anything newer past 2017 hardware, not even newer extensions are being made available.
The focus has already moved to other APIs (Vulkan and Metal), and the side effect of this will be that bitrot sets in, first in OpenGL debugging and profiling tools (older tools won't be maintained, new tools won't support GL), then in drivers.
It is officially deprecated on all Apple platforms, and has been for five years now.
Whether it will actually stop working anytime soon is a different question; but it is not a supported API.
For context: https://developer.apple.com/documentation/opengles
It is marked as being deprecated as of iOS 12, which came out in September 2018.
Non-ES version was deprecated in aligned macOS version 10.14: https://developer.apple.com/library/archive/documentation/Gr...
> One day, Apple will deprecate opengl 3.3 core
OpenGL is already deprecated on macOS and iOS for a couple of years. It still works (nowadays running as layer on top of Metal), but when building GL code for macOS or iOS you're spammed with deprecation warnings (can be turned off with a define though).
WGPU is kinda supposed to solve the problem by making a cross platform API more user friendly than Vulkan. The problem with OpenGL is that it is too far from how GPUs work and it's hard to get good performance out of it.
It is hard to get the absolute best performance out of OpenGL but it isn't really hard to get good performance. Unless you're trying to make some sort of seamless open world game with modern AAA level of visual fidelity or trying to do something very out of the ordinary, OpenGL is fine.
A bigger issue you may face is OpenGL driver bugs but AFAIK the main culprit here was AMD and a couple of years ago they improved their OpenGL driver to be much better.
Also at this point OpenGL still has no hardware raytracing extension/API so if you need that you need to use Vulkan (either just for the RT bits with OpenGL interop or switching to it completely). My own 3D engine uses OpenGL and while the performance is perfectly fine, i'm considering switching to Vulkan at some point in the future to have raytracing support.
My understanding is that one of the primary reasons Vulkan was developed was because OpenGL was not a good model for GPUs, and supporting it prevented people from taking advantage of the hardware in many cases.
It's because Vulkan is designed for driver developers and (to a lesser degree) for middleware engine developers. As far as APIs go, it's pretty much awful. I was very pumped for Vulkan when it was initially announced, but seeing the monstrosity the committee has produced has cooled down my enthusiasm very quickly.
> One day, Apple will deprecate opengl 3.3 core, and I guess everybody might end up deprecating it.
And here I am, recalling all the games and programs that failed once OpenGL 2.0 was implemented because they required OpenGL 1.1 or 1.2 but just checked the minor version number... time flies!
> I've read that generally OpenGL is just easier to use than Vulkan.
OpenGL mostly only makes sense if you followed its progress from the late 90's and understand the reasons behind all the accumulated design warts, sediment layers and just plain weird design decisions. For newcomers, OpenGL is just one weirdness after another.
Unfortunately Vulkan seems to be on the same track, which makes me think that the underlying problem is organisational, not technical, e.g. both APIs are lead by Khronos, resulting in the same 'API design and maintenance philosophy' - and frankly, the approach to API design was OpenGL's main problem, not that it didn't map to modern GPU architectures (which could have been fixed with a different API design approach without throwing the baby out with the bath water).
But on Mac, what matters more is how OpenGL compares to Metal, and the answer is much simpler: Metal both has a cleaner design, and is easier to use than OpenGL.