What I learned designing a barebones UI engine

2026-02-234:018242madebymohammed.com

Deriving user interfaces from first principles

Show article

I wrote a custom UI framework in PyGame, a library used for software rendering (graphics on the CPU), to support my experiments while giving me a standard interactive layer using event-driven paradigms similar to other UI frameworks.

The requirements were specific:

It needed to be transparent - I didn't want my UI layer to add extra cost over standard software rendering, which means no workarounds to get it to display custom canvases
It needed to be in Python - The main goal is to have an interactive layer ready to spin up for rapid experimentation. Python has a vast ecosystem of libraries and is fast to write - the UI layer needs to match that iteration speed.

Starting From Nothing

UI at its most simplest.

The initial architecture focused on brutal simplicity. I persisted a flat list of components that I would manually place by first sketching it out in Photoshop, and every frame the engine ran a minimal loop:

Hit-test: Compare the mouse coordinates and click state with the coordinates of every single component in the flat hierarchy, triggering any click/hover handlers on any components that passed the hit-test.
Update: Run a global update() loop for every component if they need to update private state consistently every frame.
Render: Call the render() method on each component, relying on my Photoshop math to make sure they render at the right size and in the right position.

This is very simple to write, but it's impractical for all but the most stylised or minimal UI layers. For a general purpose tool, it would be ideal to offload some of the math to the engine and focus on describing my UI through higher layer layout semantics, as opposed to manual pixel math.

The Family Tree

A reunion.

To achieve this, we can draw inspiration from actual UI engines and model our UI to represent nodes as a tree, instead of a flat hierarchy. Each node has a parent and one or more child nodes, which can each have their own children, and so on. I implemented an architecture where nodes are exclusively either layout-only or content-only, as opposed to something like HTML, where nodes can be contentful and have children of their own. Less flexible, but simpler to implement.

Instead of a simple list iteration, this approach requires depth-first traversal of the tree, which recurses through all the nodes. This recursive nature is essential to how the layout engine works. Each layout node implements two key methods, a measure() method to measure and return its rectangle size, and a distribute() method where a child node can be issued its final size and position.

This seems simple, but combined with the recursive nature of the tree traversal, it results in a layout engine that calls measure() on a child, that calls measure() on its child, so on and so forth, until instrinsic sizes bubble up and final positions can be distributed back down the tree.

This is an incredibly powerful paradigm and is inspired by how actual layout engines similar to the ones in Flutter and Jetpack Compose function. A crucial difference is that my layout engine only works with instrinsic sizing, and does not support any constraints. Practically, this means that a parent cannot grow or shrink its children, which is a key requirement if you want responsive design or fluid layouts. While these weren't the main requirements for the initial version of this engine, they are things I'd like to revisit, especially after watching this excellent video of how Clay (a layout engine for C) works.

Refining the engine

class Offset(ui.core.Stage):
self.back = Button(UII, "<- Back", self.clickon_back)\
.place(Alignment.TOP_LEFT, offset=[Style.PADDING.LAYOUT_PADDING]*2)
self.root = UIContainer(UII, BoxLayout("vertical")).add_elements({
"start": UIContainer(UII, BoxLayout("horizontal")).add_elements({
"label": TextLabel(UII, "Start: "),
"ebox": EntryBox(UII, "YYYY-MM-DD"),
"end": UIContainer(UII, BoxLayout("horizontal")).add_elements({
"label": TextLabel(UII, "End: "),
"ebox": EntryBox(UII, "YYYY-MM-DD"),
"amount": UIContainer(UII, BoxLayout("horizontal")).add_elements({
"label": TextLabel(UII, "GB: "),
"buttons": UIContainer(UII, BoxLayout("horizontal")).add_elements({
"date_ez": Button(UII, "Smart fill", self.clickon_fill),
"go": Button(UII, "Add!", self.clickon_go),
UIEngine.add({"back": self.back, "main": self.root})

Code snippet of what a simple form looks like, showcasing the nested box layouts with anchoring support.

With the core component API and layout abstraction nailed down, I finally reached a point where I could start designing components and simple test programs for me to use. I quickly discovered some exceptions that I took for granted in other UI engines.

Asynchronous support: One of the first GUIs I wrote involved a script that had to talk to an API, which would freeze the entire window. My solution was an abstraction for the base threading library where threads are tracked by the engine and callbacks are called on the main thread upon completion. This helps reduce the surface area for race conditions while keeping the program responsive.
Event listeners: Sometimes components need access to I/O events that involve more than just the mouse. I added a system to globally emit events that can be subscribed to, similar to JavaScript APIs in the browser (... and running into the same memory leaking problems).
Performance optimisations: Software rendered UIs can quickly slow down if not optimised correctly. I used flags to mark if a component or a layout was dirty, and made use of Python's context handler API to provide a Pythonic way of updating components while handling the flags behind the scenes. Components are only redrawn and layouts are only recalculated when the respective flag is set, allowing the program to minimise CPU usage to only when it's needed.
UI Stages: Most UIs don't consist of a single "stage" of UI elements. Ideally, we want to navigate to various "stages" (or "pages" as they're called in a browser) depending on UI state. I implemented a state machine similar to how mobile applications work, where you can push a stage to a stack and return from it, or clear the entire stack and start fresh for destructive navigation.

Beyond the basics

An actual screenshot - featuring the minimal hardcoded stylesheet that ended up inspiring the style of this website.

What I have now works fine for basic / experimental scripts where raw iteration speed is more important than maintenance, but ideally, we'd want to bridge that gap and add more functionality. Here are a couple of more advanced ideas I'd like to explore in the future, inspired by real systems:

Declarative API: Can we take the huge improvement in developer experience from moving from manual pixel -> automatic layout, and apply that to UI state? The program becomes a description of what you'd want to see for any given state, instead of a set of instructions to poke at the UI every single time a variable changes. This requires either a fine-tuned reactivity primitive (similar to SolidJS) or an optimised reconciler for diffing our UI tree with an ephemeral one created when state changes (like React.js).
Composability: With the current API, my programs consist of big components that do whole tasks at once, render directly to surfaces, and store and manage their state opaquely. This is simple for the engine, but gets hard to manage for the developer. Modern paradigms are adopting a more functional, compositional API where programs consist of many tiny UI primitives that compose to make something larger. Supporting this requires an overhaul of the event-handling system to support event bubbling, and optimisation of almost all aspects of the engine to handle moving the complexity to the UI tree.
Custom styling: Right now, the engine relies on a hardcoded stylesheet full of global style declarations that are referenced in the render method for each component. Ideally, we would combine this with a user-configurable styling API. Something similar to TailwindCSS utility classes would fit perfectly with the "minimal" target we're aiming for - but applying directly to the renderer instead of compiling to a file.

Conclusion

Ironically, this project started because I didn't want a UI. Existing solutions were opaque and required lots of boilerplate that often exceeded the actual scale of my projects. I just wanted clickable surfaces and a way to hack at the layers underneath. As the project grew, I ended up organically discovering how to construct simple abstractions through trying (and sometimes failing) to write my own, and why it's paradoxically anything but simple to do right.

While it’s far from perfect, writing it taught me more about UI systems than I ever would have learned by sticking to established solutions alone.

Read more about the high-performance video mosaic rendering and streaming engine I originally designed this UI library for.

Read the original article

Comments

By dazzawazza 2026-02-2311:34

> While it’s far from perfect, writing it taught me more about UI systems than I ever would have learned by sticking to established solutions alone.

This is a great attitude to have. Keep up the great work.

By cardanome 2026-02-2311:384 reply

Immediate mode GUI is the way to go.

Retaining state is a pain and causes bugs. Trying to get fancy a la react and diffing the tree for changes makes not sense. That was a performance hack because changing the DOM in JS used to be slow as hell. You don't need that.

Just redraw the whole thing every frame. Great performance, simple, less bugs.

By tarnith 2026-02-2312:286 reply

This works for simple apps, utilities, and demos/mvps. Not great for actual applications.

What about when you're embedding your GUI into an existing application? or for use on an already taxed system? (Audio plugins come to mind)

What if something is costly, that you need to compute dynamically, but not often, makes it into the frame? Do you separately now create a state flag for that one render object?

By spiffyk 2026-02-2313:39

> What if something is costly, that you need to compute dynamically, but not often, makes it into the frame? Do you separately now create a state flag for that one render object?

The point of immediate mode UIs is not necessarily that there is no state specific to the UI, but rather that the state is owned by user code. You can (and, in these more complex cases, should) retain state between frames. The main difference is that the state is still managed by your code, rather than the UI system ("library", whatever).

By leecommamichael 2026-02-2319:10

> What about when you're embedding your GUI into an existing application? or for use on an already taxed system?

You should check out the gamedev scene. It's soft real-time, and yet dearIMGUI is the choice for tooling. Immediate-mode is an API-design, not the implementation details. All Immediate-mode GUIs retain data some data, and for that reason they each have their own APIs for retaining data in various capacities. Usually something really simple like hashing and component-local state.

> This works for simple apps, utilities, and demos/mvps. Not great for actual applications.

Respectfully, I don't think you're informed on this. Probably the most responsive debugger out there is RAD Debugger and it's built with an IMGUI.

By cardanome 2026-02-2313:07

Immediate mode UI optimizes for the worst case. That is the case you care about most for real time applications.

Retained mode is more optimal when not much changes but if a lot of stuff changes at once it can be worse. So for real time applications like your audio example or games you want immediate mode. Retained mode is better for saving battery though or can be.

By naasking 2026-02-2314:27

> Do you separately now create a state flag for that one render object?

That can be a reasonable choice sometimes. Note that the point is that you introduce state where necessary, rather than stateful UI being the default as with retained mode.

By nurettin 2026-02-2319:07

> What about when you're embedding your GUI into an existing application? or for use on an already taxed system? (Audio plugins come to mind)

I've used ImGui in exactly these kinds of projects. Game engines already render graphics, so it is just part of the same pipeline. Rendering the gui is instant, how many fps you want to render is up to you.

By BatteryMountain 2026-02-2313:43

For interest sake, have a look at the flutter engine. It does this kind of diff on each build (meaning, each time the ui tree gets modified & triggers a rebuild); they split their objects into stateful & stateless, and then in your own code you have to make sure to not unnecessarily trigger rebuilds for expensive objects. So it kinda force you to think & separate cheap & expensive ui objects.

By saidinesh5 2026-02-2311:451 reply

That really depends on the kind of user interface no?

If you just have a lot of text and a few rectangles and no animation, immediate mode would work well...

But if you have a lot of images, animation etc ... You'd anyway have to track all the textures uploaded to the GPU to not reupload them. Might as well retain as much of the state as possible? (Eg. QtQuick)

By flohofwoe 2026-02-2312:061 reply

Isn't it the other way around?

The more dynamic/animated an UI is, the less there's a difference between a retained- and immediate-mode API, since the UI needs to be redrawn each frame anyway. Immediate mode UIs might even be more efficient for highly dynamic UIs because they skip a lot of internal state update code - like creating/destroying/showing/hiding/moving widget objects).

Immediate-mode UIs can also be implemented to track changes and retain the unchanged parts of the UI in baked textures, it's just usually not worth the hassle.

The key feature of immediate mode UIs is that the application describes the entire currently visible state of the UI for each frame which allows the UI code to be 'interleaved' with application state changes (e.g. no callbacks required), how this per-frame UI description is translated into pixels on screen is more or less an implementation detail.

By saidinesh5 2026-02-2312:411 reply

> The more dynamic/animated an UI is, the less there's a difference between a retained- and immediate-mode API, since the UI needs to be redrawn each frame anyway. Immediate mode UIs might even be more efficient for highly dynamic UIs because they skip a lot of internal state update code - like creating/destroying/showing/hiding/moving widget objects).

That depends on the kind of animations - typically for user interfaces, it's just moving, scaling, playing with opacity etc.. that's just updating the matrices once.

So you describe the scene graph once (this rectangle here, upload that texture there, this border there) using DOM, QML etc..., and then just update the item properties on it.

As far as the end user/application developer is concerned , this is retained mode. As far as the GPU is considered it can be redrawing the whole UI every frame..

By flohofwoe 2026-02-2313:011 reply

> it's just moving, scaling, playing with opacity etc.. that's just updating the matrices once.

...any tiny change like this will trigger a redraw (e.g. the GPU doing work) that's not much different from a redraw in an immediate mode system.

At most the redraw can be restricted to a part of the visible UI, but here the question is whether such a 'local' redraw is actually any cheaper than just redrawing everything (since figuring out what needs to be redrawn might be more expensive than just rendering everything from scratch - YMMV of course).

By saidinesh5 2026-02-2313:521 reply

It's not about what gets redrawn but also how much of the UI state is still retained (by the GPU). Imagine having to reupload all the textures, meshes to the GPU every frame.

Something like a lot of text ? Probably easier to redraw everything in immediate mode.

Something like a lot of images just moving, scaling, around? Easier to retain that state in GPU and just update a few values here and there...

By flohofwoe 2026-02-2314:38

> Easier to retain that state in GPU and just update a few values here and there

It's really not that trivial to estimate, especially on high-dpi displays.

Rendering a texture with a 'baked UI' to the framebuffer might be "just about as expensive" as rendering the detailed UI elements directly to the framebuffer.

Processing a pixel isn't inherently cheaper than processing a vertex, but there are a lot more pixels than vertices in typical UIs (a baked texture might still win when there's a ton of alpha-blended layers though).

Also, of course you'd also need to aggressively batch draw calls (e.g. Dear ImGui only issues a new render command when the texture or clipping rectangle changes, e.g. a whole window will typically be rendered in one or two draw calls).

By amelius 2026-02-2312:122 reply

> Just redraw the whole thing every frame. Great performance, simple, less bugs.

And in low power applications? Like on a smartphone?

By flohofwoe 2026-02-2313:051 reply

When the UI is highly dynamic/animated it needs to be redrawn each frame also in a 'retained mode' UI framework.

When the UI is static and only needs to change on user input, an immediate mode UI can 'stop' too until there's new input to process.

For further low-power optimizations, immediate mode UI frameworks could skip describing parts of the UI when the application knows that this part doesn't need to change (contrary to popular belief, immediate mode UI frameworks do track and retain state between frames, just usually less than retained mode UIs - but how much state is retained is an internal implementation detail).

By amelius 2026-02-2313:302 reply

The problem is that widgets still need to store state somewhere, and that storage space needs to be reclaimed at some point. How does the system know when that can be done? I suppose the popular approach is to just reclaim space that wasn't referenced during a draw.

However ...

When you have a listbox of 10,000 rows and you only draw the visible rows, then the others will lose their state because of this.

Of course there are ways around that but it becomes messy. Maybe so messy that retained mode becomes attractive.

By flohofwoe 2026-02-2314:291 reply

> How does the system know when that can be done?

At the earlist in the first frame the application UI description code doesn't mention an UI item (that means UI items need a persistent id, in Dear ImGui this is a string hash, usually created from the item's label which can have a hidden `##` identifier to make it unique, plus a push/pop-id stack for hierarchical namespacing.

> then the others will lose their state because of this

Once an item is visible, the state must have been provided by the application's UI description code, when the item is invisible, that state becomes irrelevant.

By amelius 2026-02-2314:491 reply

> when the item is invisible, that state becomes irrelevant.

What happens when the item moves out of view, e.g. because the user scrolls down?

State should be preserved, because the user might scroll back up.

By flohofwoe 2026-02-2315:041 reply

Once the item becomes visible, the application's UI code provides the item's state again.

E.g. pseudocode:

    for (firstVisibleItemIndex .. lastVisibleItemIndex) |itemIndex| {
        ui_list_item(itemIndex, listItemValues[itemIndex]);
    }

For instance Dear ImGui has the concept of a 'list clipper' which tells the application the currently visible range of a list or table-column and the application only provides the state of the currently visible items to the UI system.

By amelius 2026-02-2315:231 reply

Ok, but now items 1,000 through 10,000 are deleted from the data container.

How does the immediate mode system know that the corresponding state can be deleted too?

Does the system provide tools for that or does the burden lie on my application code?

By flohofwoe 2026-02-2315:45

Same way as for regular ui items, if the application's ui code no longer "mentions" those items, their state can be deleted (assuming the immediate mode tracks hidden items for some reason).

By cardanome 2026-02-2314:002 reply

The job of the immediate UI is to just draw the things. Where and how you manage your state is completely up to you.

It seems you assume some sort of OO model.

> When you have a listbox of 10,000 rows and you only draw the visible rows, then the others will lose their state because of this.

Well keep the state then.

Immediate mode really just means you have your data as an array of things or whatever and the UI library creates the draw calls for you. Drawing and data are separate.

By flohofwoe 2026-02-2314:20

> The job of the immediate UI is to just draw the things. Where and how you manage your state is completely up to you.

This is a bit oversimplified. For instance Dear ImGui needs to store at least the window positions between frames since the application code doesn't need to track window positions.

By amelius 2026-02-2314:501 reply

Well, I can keep the state, but a retained mode UI model does it for me :)

By lioeters 2026-02-2316:091 reply

But then you have state in two places, user code and the retained-mode GUI framework, which need to be synced - that's where complexity creeps in. Immediate mode removes that redundancy and makes things simpler in many situations. It depends on your preference and what you're doing too, which approach suits better.

By amelius 2026-02-2319:361 reply

But why do you think retained mode was invented if "just drawing" is so simple?

By lioeters 2026-02-2320:02

Here's an informative explanation in the DearImgui library which chose this approach.

https://github.com/ocornut/imgui/wiki/About-the-IMGUI-paradi...

By lelanthran 2026-02-2313:10

> And in low power applications? Like on a smartphone?

Doesn't make a difference. If the page is static, there is no redraw happening. If the page is dynamic, the redraw is happening at the frequency of the change (once per second, or once per frame, or whatever).

Whether you're doing a diff of the DOM or redrawing the whole DOM, typical pages (i.e. not two-sigmas past the median) aren't going to redraw something on every frame anyway.

By lloydatkinson 2026-02-2314:351 reply

What are you even basing this on? I did an experiment a few days ago, where a text input on a web page reacts to text input. Once it hits a certain count, the colour changes, like a validation error.

In the initial crappy implementation the code was assigning the same class over and over to the text input, rather than only when required. Despite that being an obvious bug, I could literally feel the difference in typing speed and how that was hammering the page.

Once the bug was fixed, and it only assigned it once correctly, the problem went away.

"redraw everything the whole frame" and "don't do any diffing" sound insane in this regard.

By flohofwoe 2026-02-2314:53

> "redraw everything the whole frame" and "don't do any diffing" sound insane in this regard.

You need to consider that a web browser with its millions of lines of code in the DOM and rendering engine is pretty much the worst case for "redrawing a complex UI each frame", especially since the DOM had been designed for mostly static 'documents' and not highly dynamic graphical UIs.

Add React on top and the whole contraption might still be busy with figuring out what has changed and needs to be redrawn at the time an immediate mode UI sitting directly on top of a 3D API is already done rendering the entire UI from scratch.

A native immediate mode UI will easily be several hundred times less code (for instance Dear ImGui is currently just under 50kloc 'orthodox C++').

By threetwoonezero 2026-02-2312:03

Had a similar itch during my game development with libgdx, and had almost same architecture eventually

I found that I have two different ways to construct UI layout , from top down, and from down to top, those could be contradictory, wonder how one could solve this, seems like common problem in all frameworks that I saw, like flutter just fail with error on screen if it can't solve restrictions in such conflict , others just show jiberish