Comments

  • By mpweiher 2026-01-0421:458 reply

    > Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

    Because the attempts at segmented or object-oriented address spaces failed miserably.

    > Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory.

    That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

    The 8086 was sort-of segmented to get 20 bit addresses out of a 16 bit machine and a stop-gap and a huge success. The 80286 did things "properly" again and went all-in on the segments when going to virtual memory...and sucked. Best I remember, it was used almost exclusively as a faster 8086, with the 80286 modes used to page memory in and out and with the "reset and recover" hack to then get back to real mode for real work.

    The 80386 introduced the flat address space and paged virtual memory not because of backwards-compatibility, but because it could and it was clearly The Right Thing™.

    • By SulphurCrested 2026-01-052:161 reply

      The Burroughs large system architecture of the 1960s and 1970s (B6500, B6700 etc.) did it. Objects were called “arrays” and there was hardware support for allocating and deallocating them in Algol, the native language. These systems were initially aimed at businesses (for example, Ford was a big customer for such things as managing what parts were flowing where) I believe, but later they managed to support FORTRAN with its unsafe flat model.

      These were large machines (think of a room 20m square) and with explicit hardware support for Algol operations including the array stuff and display registers for nested functions, were complex and power hungry and with a lot to go wrong. Eventually, with the technology of the day, they became uncompetitive against simpler architectures. By this time too, people wanted to program in languages like C++ that were not supported.

      With today’s technology, it might be possible.

      • By pjmlp 2026-01-0514:57

        It still exists today, as Unisys ClearPath MCP.

    • By mpweiher 2026-01-0511:412 reply

      > The 80386 introduced the flat address space ...

      This may be misleading: the 80386 introduced flat address space and paged virtual memory _in the Intel world_, not in general. At the time it was introduced, linear / flat address space was the norm for 32 bit architectures, with examples such as the VAX, the MC68K, the NS32032 and the new RISC processors. The IBM/360 was also (mostly) linear.

      So with the 80386, Intel finally abandoned their failed approach of segmented address spaces and joined the linear rest of the world. (Of course the 386 is technically still segmented, but let's ignore that).

      And they made their new CPU conceptually compatible with the linear address space of the big computers of the time, the VAXens and IBM mainframes and Unix workstations. Not the "little" ones.

      • By phkamp 2026-01-0511:541 reply

        The important bit here is "their failed approach", just because Intel made a mess of it, doesn't mean that the entire concept is flawed.

        (Intel is objectively the most lucky semiconductor company, in particular if one considers how utterly incompetent their own "green-field" designs have been.

        Think for a moment how luck a company has to be, to have the major competitor they have tried to kill with all means available, legal and illegal, save your company, when you bet the entire farm on Itanic ?)

        • By mpweiher 2026-01-0513:431 reply

          It isn't 100% proof that the concept is flawed, but the fact that the for decades most successful CPU manufacturer in the world couldn't make segmentation work in multiple attempts is pretty strong evidence that at least there are, er, "issues" that aren't immediately obvious.

          I think it is safe to assume that they applied what they learned from their earlier failures to their later failures.

          Again, we can never be 100% certain of counterfactuals, but certainly the assertion that linear address spaces were only there for backwards compatibility with small machines is simply historically inaccurate.

          Also, Intel weren't the only ones. The first MMU for the Motorola MC68K was the MC68451, which was a segmented MMU. It was later replaced by the MC68851, a paged MMU. The MC68451, and segmentation, was both rarely used and then discontinued. The MC68851 was comparatively widely used, and later integrated in simplified form into future CPUs like the MC68030 and its successors.

          So there as well, segmentation was tried first and then later abandoned. Which again, isn't definitive proof that segmentation is flawed, but way more evidence than you give credit for in your article.

          People and companies again and again start out with segmentation, can't make it work and then later abandon it for linear paged memory.

          My interpretation is that segmentation is one of those things that sounds great in theory, but doesn't work nearly as well in practice. Just thinking about it in the abstract, making an object boundary also a physical hardware-enforced protection boundary sounds absolutely perfect to me! For example something like the LOOM object-based virtual memory system for Smalltalk (though that was more software).

          But theory ≠ practice. Another example of something that sounded great in theory was SOAR: Smalltalk on a RISC. They tried implementing a good part of the expensive bits of Smalltalk in silicon in a custom RISC design. It worked, but the benefits turned out to be minimal. What actually helped were larger caches and higher memory bandwidth, so RISC.

          Another example was the Rekursiv, which also had object-capability addressing and a lot of other OO features in hardware. Also didn't go anywhere.

          Again: not everything that sounds good in theory also works out in practice.

          • By phkamp 2026-01-0514:091 reply

            All the examples you bring up are from an entirely different time in terms of hardware, a time where one of the major technological limitations were how many pins a chip could have and two-layer PCBs.

            Ideas can be good, but fail because they are premature, relative to the technological means we have to implement them. (Electrical vehicles will probably be the future text-book example of this.)

            The interesting detail in the R1000's memory model, is that it combines segmentation with pages, removing the need for segments to be contiguous in physical memory, which gets rid of the fragmentation issue, which was a huge issue for the archtectures you mention.

            But there obviously always will be a tension between how much info you stick into whatever goes for a "pointer" and how big it becomes (ie: "Fat pointers") but I think we can safely say that CHERI has documented that fat pointers is well worth their cost, and how we are just discussing what's in them.

            • By mpweiher 2026-01-0710:00

              > examples from different time

              Yes, because (a) you asked a question about how we got here and (b) that question actually has a non-rhetorical historical answer. This is how we got here, and that's when it pretty much happened.

              > one of the major technological limitations were how many pins a chip could have and two-layer PCBs

              How is that relevant to the question of segmentation vs. linear address space? In your esteemed opinion? The R1000 is also from that erea, the 68451 and 68851 were (almost) contemporaries, and once the MMU was integrated into the CPU, the pins would be exactly the same.

              > Ideas can be good, but fail because they are premature

              Sure, and it is certainly possible that in the future, segmentation will make a comeback. I never wrote it couldn't. I answered your question about how we got here, which you answered incorrectly in the article.

              And yes, the actual story of how we got here does indicate that hardware segmentation is problematic, though it doesn't tell us why. It also strongly hints that hardware segmentation is both superficially attractive and less obviously, but subtly and deeply flawed.

              CHERI is an interesting approach and seems workable. I don't see how it's been documented to be worth the cost, as we simply haven't had wide-spread adoption yet. Memory pressure is currently the main performance driver ("computation is what happens in the gaps while the CPU waits for memory"), so doubling pointer sizes is definitely going to be an issue.

              It certainly seems possible that CHERI will push us more strongly away from direct pointer usage than 64 bit already did, towards base + index addressing. Maybe a return to object tables for OO systems? And for large data sets, SOAs. Adjacency tables for graphs. Of course those tend to be safe already.

      • By StillBored 2026-01-072:521 reply

        <i>So with the 80386, Intel finally abandoned their failed approach of segmented address spaces and joined the linear rest of the world. (Of course the 386 is technically still segmented, but let's ignore that).</i>

        That seems an odd interpretation of how they extended the 286 protected mode on the 386. The 286 converted the fixed address+64k sized segment registers to 'selectors' in the LDT/GDT which added permissions/etc to a segment descriptor structure which were transparently cached along with the 'base' of the segment in generally invisible portions of the register. The problem with this approach was the same as CHERI/etc that it requires a fat pointer comprising the segment+offset which to this day remains problematic with standard C where certain classes of programmers expect that sizeof (void*) == sizeof (int or long).

        Along comes the 386 with adds a further size field (limit) to the segment descriptor which can be either bytes or pages.

        And of course it added the ability to back linear addresses with paging, if enabled.

        Its entirely possible to run the 386 in an object=segment only mode where each data structure exists in its own segment descriptor and the hardware enforces range checking, and heap compression/etc can happen automatically by simply copying the segment to another linear address and adjusting the base address. By today standards the number of outstanding segment descriptors is limiting, but remember 1985 when a megabyte of RAM was a pretty reasonable amount...

        The idea that someone would create a couple descriptors with base=0:limit=4G and set all the segment register to them, in order to assure that int=void * is sorta a known possible misuse of the core architecture. Of course this basically requires paging as the processor then needs to deal with the fact that it likely doesn't actually have 4G of ram, and the permissions model then is enforced at a 4K granularity. Leaving open all the issues C has with buffer overflows, and code + data permissions mixing/etc. Its not a better model, just one easier to reason about initially, but then for actual robust software starts to fall apart for long running processes due to address space fragmentation and a lot of other related problems.

        AKA, it wasn't necessarily the best choice, and we have been dealing with the repercussion of lazy OS/systems programmers for the 40 years since.

        PS: intel got(gets) a lot of hate from people wanting to rewrite history, by ignoring the release date of many of these architectural advancements. Ex the entire segment register 'fiasco' is a far better solution than the banked memory systems available in most other 8/16 bit machines. The 68000 is fully a year later in time, and makes no real attempt at being backwards compatible with the 6800 unlike the 8086 which is clearly intended to be a replacement for the 8080.

        • By mpweiher 2026-01-0712:521 reply

          You did see that bit you quoted?

          > (Of course the 386 is technically still segmented, but let's ignore that)

          Yes, the 80386 was still technically segmented, but the overwhelming majority of operating systems (95%+) effectively abandoned segmentation for memory protection and organization, except for very broad categories such as kernel vs. user space.

          Instead, they configured the 80386 registers to provide a large linear address space for user processes (and usually for the kernel as well).

          > The idea that someone would create a couple descriptors with base=0:limit=4G and set all the segment register to them, in order to assure that int=void * is sorta a known possible misuse of the core architecture

          The thing that you mischaracterize as a "misuse" of the architecture wasn't just some corner case that was remotely "possible", it was what 95% of the industry did.

          The 8086 wasn't so much a design as a stopgap hail-mary pass following the fiasco of the iAPX 432. And the VAX existed long before the 8086.

          • By StillBored 2026-01-0714:441 reply

            I think my point revolves more around what the HW designers were enabling. If they thought that the flat model was the right one, they would have just kept doing what the 286 did, and fixed the segment sizes at 4G.

            • By mpweiher 2026-01-0715:481 reply

              Yes. The point is that the hardware designers were wrong in thinking that the segmented model was the right one.

              The hardware designers kept enabling complex segmented models using complex segment machinery. Operating system designers fixed the segments as soon as the hardware made that possible in order to enable a flat (paged) memory model and never looked back.

              • By rep_lodsb 2026-01-0813:041 reply

                But were the software people actually right, or did they just follow the well-trodden path of VMS / UNIX, instead of making full use of the x86 hardware?

                Having separate segments for every object is problematic because of pointer size and limited number of selectors, but even 3 segments for code/data/stack would have eliminated many security bugs, especially at the time when there was no page-level NX bit. For single-threaded programs, the data and stack segment could have shared the same address space but with a different limit (and the "expand-down" bit set), so that 32-bit pointers could reach both using DS, while preventing [SS:EBP+x] from accessing anything outside the stack.

                • By mpweiher 2026-01-208:19

                  Inasmuch as hardware exists to run software, so software is the customer, the hardware people were wrong by definition, as they created a product that their customers weren't asking for, didn't want and had no use for.

                  Might segmentation have been better if the software had wanted it? Well, it's a counterfactual, so in some sense we can't know. And we can argue why we believe one or the other is better, but the evidence seems to be pretty overwhelming. It's not that there weren't (and aren't) operating systems that use segmentation, but somehow their "better" memory model didn't take the world by storm.

    • By yvdriess 2026-01-0513:031 reply

      > > Linear virtual addresses were made to be backwards-compatible with tiny computers with linear physical addresses but without virtual memory.

      > That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

      That's not refuting the point he's making. The mainframe-on-chip iAPX family (and Itanium after) died and had no heirs. The current popular CPU families are all descendents of the stopgap 8086 evolved from the tiny computer CPUs or ARM's straight up embedded CPU designs.

      But I do agree with your point that a flat (global) virtual memory space is a lot nicer to program. In practice we've been fast moving away from that again though, the kernel has to struggle to keep up the illusion: NUCA, NUMA, CXL.mem, various mapped accelerator memories, etc.

      Regarding the iAPX 432, I do want to set the record straight as I think you are insinuating that it failed because of its object memory design. The iAPX failed mostly because of it's abject performance characteristics, but that was in retrospect [1] not inherent to the object directory design. It lacked very simple look ahead mechanisms, no instruction or data caches, no registers and not even immediates. Performance did not seemed to be a top priority in the design, to paraphrase an architect. Additionally, the compiler team was not aligned and failed to deliver on time, which only compounded the performance problem.

        - [1] https://dl.acm.org/doi/10.1145/45059.214411

      • By mpweiher 2026-01-0513:592 reply

        The way you selectively quoted: yes, you removed the refutation.

        And regarding the iAPX 432: it was slow in large part due to the failed object-capability model. For one, the model required multiple expensive lookups per instruction. And it required tremendous numbers of transistors, so many that despite forcing a (slow) multi-chip design there still wasn't enough transistor budget left over for performance enhancing features.

        Performance enhancing features that contemporary designs with smaller transistor budgets but no object-capability model did have.

        Opportunity costs matter.

        • By yvdriess 2026-01-0515:36

          I agree on the part of the opportunity cost and that given the transistor budgets of the time a simpler design would have served better.

          I fundamentally disagree on putting the majority of the blame on the object memory model. The problem was that they were compounding the added complexity of the object model with a slew of other unnecessary complexities. They somehow did find the budget to put the first full IEEE floating point unit on the execution unit, implemented a massive[1] decoder and microcode for the bit-aligned 200+ instruction set and interprocess communication. The expensive lookups per instructions had everything to do with cutting caches and programmable registers, not any kind of overwhelming complexity to the address translation.

          I strongly recommend checking the "Performance effects of architectural complexity in the Intel 432" paper by Colwell that I linked in the parent.

          [1] die shots: https://oldbytes.space/@kenshirriff/110231910098167742

        • By phkamp 2026-01-0514:14

          I huge factor in iAPX432 utter lack of success, were technological restrictions, like pin-count limits, laid down by Intel Top Brass, which forced stupid and silly limitations on the implementation.

          That's not to say that iAPX432 would have succeeded under better management, but only to say that you cannot point to some random part of the design and say "That obviously does not work"

    • By inkyoto 2026-01-050:353 reply

      > Because the attempts at segmented or object-oriented address spaces failed miserably.

      > That is false. In the Intel World, we first had the iAPX 432, which was an object-capability design. To say it failed miserably is overselling its success by a good margin.

      I would further posit that segmented and object-oriented address spaces have failed and will continue to fail for as long as we have a separation into two distinct classes of storage: ephemeral (DRAM) and persistent storage / backing store (disks, flash storage, etc.) as opposed to having a single, unified concept of nearly infinite (at least logically if not physically), always-on just memory where everything is – essentially – an object.

      Intel's Optane has given us a brief glimpse into what such a future could look like but, alas, that particular version of the future has not panned out.

      Linear address space makes perfect sense for size-constrained DRAM, and makes little to no sense for the backing store where a file system is instead entrusted with implementing an object-like address space (files, directories are the objects, and the file system is the address space).

      Once a new, successful memory technology emerges, we might see a resurgence of the segmented or object-oriented address space models, but until then, it will remain a pipe dream.

      • By LegionMammal978 2026-01-051:342 reply

        I don't see how any amount of memory technology can overcome the physical realities of locality. The closer you want the data to be to your processor, the less space you'll have to fit it. So there will always be a hierarchy where a smaller amount of data can have less latency, and there will always be an advantage to cramming as much data as you can at the top of the hierarchy.

        • By adgjlsfhk1 2026-01-052:271 reply

          while that's true, CPUs already have automatically managed caches. it's not too much of a stretch to imagine a world in which RAM is automatically managed as well and you don't have a distinction between RAM and persistent storage. in a spinning rust world, that never would have been possible, but with modern nvme, it's plausible.

          • By bluGill 2026-01-054:291 reply

            Cpus manage it, but ensuring your data structures are friendly to how they manage caches is one of the keys to fast programs - which some of us care about.

            • By adgjlsfhk1 2026-01-0515:201 reply

              Absolutely! And it certainly is true that for the most performance optimized codes, having manual cache management would be beneficial, but on the CPU side, at least, we've given up that power in favor of a simpler programming model.

              • By bluGill 2026-01-0517:01

                Part of giving up is what is correct changes too fast. Attempts to do this manually often got great results for a year and then made things worse for the next generation of CPU that did things differently. Anyone who needs manual control thus would need to target a specific CPU and be willing to spend hundreds of millions every year to update the next CPU - there is nobody who is willing to spend that much. The few who would be are better served by putting the important thing into a FPGA which is going to be faster yet for similar costs.

        • By inkyoto 2026-01-054:401 reply

          «Memory technology» as in «a single tech» that blends RAM and disk into just «memory» and obviates the need for the disk as a distinct concept.

          One can conjure up RAM, which has become exabytes large and which does not lose data after a system shutdown. Everything is local in such a unified memory model, is promptly available to and directly addressable by the CPU.

          Please do note that multi-level CPU caches still do have their places in this scenario.

          In fact, this has been successfully done in the AS/400 (or i Series), which I have mentioned elsewhere in the thread. It works well and is highly performant.

          • By jason_oster 2026-01-055:48

            > «Memory technology» as in «a single tech» that blends RAM and disk into just «memory» and obviates the need for the disk as a distinct concept.

            That already exists. Swap memory, mmap, disk paging, and so on.

            Virtual memory is mostly fine for what it is, and it has been used in practice for decades. The problem that comes up is latency. Access time is limited by the speed of light [1]. And for that reason, CPU manufacturers continue to increase the capacities of the faster, closer memories (specifically registers and L1 cache).

            [1] https://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

      • By gpderetta 2026-01-0515:411 reply

        I think seamless persistent storage is also bound to fail. There are significant differences on how we treat ephemeral objects in programs and persistent storage. Ephemeral objects are low value, if something goes wrong we can just restart and recover from storage. Persistent storage is often high value, we make significant effort to guarantee its consistency and durability even in the presence of crashes.

        • By senderista 2026-01-0518:291 reply

          “Persistent memory leaks” will be an interesting new failure mode.

          • By antonvs 2026-01-0520:51

            Anyone using object storage at scale (e.g. S3 or GCS) is already likely to be familiar with this.

      • By duped 2026-01-051:402 reply

        I shudder to think about the impact of concurrent data structures fsync'ing on every write because the programmer can't reason about whether the data is in memory where a handful of atomic fences/barriers are enough to reason about the correctness of the operations, or on disk where those operations simply do not exist.

        Also linear regions make a ton of sense for disk, and not just for performance. WAL-based systems are the cornerstone of many databases and require the ability to reserve linear regions.

        • By inkyoto 2026-01-055:111 reply

          Linear regions are mostly a figment of imagination in real life, but they are a convenient abstraction and a concept.

          Linear regions are nearly impossible to guarantee, unless the underlying hardware has specific, controller-level provisions.

            1) For RAM, the MCU will obscure the physical address of a memory page, which can come from a completely separate memory bank. It is up to the VMM implementation and heuristics to ensure the contiguous allocation, coalesce unrelated free pages into a new, large allocation or map in a free page from a «distant» location.
          
            2)  Disks (the spinning rust variety) are not that different.  A freed block can be provided from the start of the disk. However, a sophisticated file system like XFS or ZFS, and others like it, will make an attempt do its best to allocate a contiguous block.
          
            3) Flash storage (SSDs, NVMe) simply «lies» about the physical blocks and does it for a few reasons (garbage collection and the transparent reallocation of ailing blocks – to name a few). If I understand it correctly, the physical «block» numbers are hidden even from the flash storage controller and firmware themselves.
          
          The only practical way I can think of to ensure the guaranteed contiguous allocation of blocks unfortunately involves a conventional hard drive that has a dedicated partition created just for the WAL. In fact, this is how Oracle installation worked – it required a dedicated raw device to bypass both the VMM and the file system.

          When RAM and disk(s) are logically the same concept, WAL can be treated as an object of the «WAL» type with certain properties specific to this object type only to support WAL peculiarities.

          • By duped 2026-01-055:561 reply

            Ultimately everything is an abstraction. The point I'm making is that linear regions are a useful abstraction for both disk and memory, but that's not enough to unify them. Particularly in that memory cares about the visibility of writes to other processes/threads, whereas disk cares about the durability of those writes. This is an important distinction that programmers need to differentiate between for correctness.

            Perhaps a WAL was a bad example. Ultimately you need the ability to atomically reserve a region of a certain capacity and then commit it durably (or roll back). Perhaps there are other abstractions that can do this, but with linear memory and disk regions it's exceedingly easy.

            Personally I think file I/O should have an atomic CAS operation on a fixed maximum number of bytes (just like shared memory between threads and processes) but afaik there is no standard way to do that.

            • By inkyoto 2026-01-057:33

              I do not share the view that the unification of RAM and disk requires or entails linear regions of memory. In fact, the unification reduces the question of «do I have a contiguous block of size N to do X» to a mere «do I have enough memory to do X?», commits and rollbacks inclusive.

              The issue of durability, however, remains a valid concern in either scenario, but the responsibility to ensure durability is delegated to the hardware.

              Futhermore, commits and rollbacks are not sensitive to the memory linearity anyway; they are sensitive to durability of the operation, and they may be sensitive to the latency, although it is not a frequently occurring constraint. In the absence of a physical disk, commits/rollbacks can be implemented using the software transactional memory (STM) entirely in RAM and today – see the relevant Haskell library and the white paper on STM.

              Lastly, when everything is an object in the system, the way the objects communicate with each other also changes from the traditional model of memory sharing to message passing, transactional outboxes, and similar, where the objects encapsulate the internal state without allowing other objects to access it – courtesy of the object-oriented address space protection, which is what the conversation initially started from.

        • By adgjlsfhk1 2026-01-052:35

          otoh, WAL systems are only necessary because storage devices present an interface of linear regions. the WAL system could move into the hardware.

    • By shrubble 2026-01-055:261 reply

      For object you have the IBM i Series / AS400 based systems which used an object capabilities model (as far as I understand it). A refinement and simplification of what was pioneered in the less successful System/38.

      For linear, you have the Sun SPARC processor coming out in 1986, the same year that 386 shipped in volume. I think the use by Unix of linear made it more popular (the MIPSR2000 came out in January 1986, also).

      • By tliltocatl 2026-01-056:211 reply

        > IBM i Series / AS400

        Aren't AS400 closer to JVM than to a AXP432 in it's implementation details? Sans IBM's proprietary lingo, TIMI is just a bytecode virtual machine and was designed as such from the beginning. Then again, on a microcoded CPU it's hard to tell the difference.

        > I think the use by Unix of linear made it more popular

        More like linear address space was the only logical solution since EDVAC. Then in late 50's Manchester Atlas invented virtual memory to abstract away the magnetic drums. Some smart minds (Robert S. Barton with his B5000 which was a direct influence for JVM but was he the first one?) released what we actually want is segment/object addressing. Multics/GE-600 went with segments (couldn't find any evidence they were directly influenced by B5000 but seems so).

        System/360, which was the pre-Unix lingo franca, went with flat address space. Guess IBM folks wanted to go as conservative as possible. They also wanted S/360 to compete in HPC as well so performance was quite important - and segment addressing doesn't give you that. Then VM/370 showed that flat/paged addressing allows you to do things segments can't. And then came PDP-11 (which was more or less about compressing S/360 into a mini, sorry DEC fans), RMS/VMS and Unix.

        • By ch_123 2026-01-0510:33

          > TIMI is just a bytecode virtual machine and was designed as such from the beginning.

          It's a bit more complicated than that. For one, it's an ahead-of-time translation model. The object references are implemented as tagged pointers to a single large address space. The tagged pointers rely on dedicated support in the Power architecture. The Unix compatibility layer (PASE) simulates per-process address spaces by allocating dedicated address space objects for each Unix process (these are called Terraspace objects).

          When I read Frank Soltis' book a few years ago, the description of how the single level store was implemented involved segmentation, although I got the impression that the segments are implemented using pages in the Power architecture. The original CISC architecture (IMPI) may have implemented segmentation directly, although there is very little documentation on that architecture.

          This document describes the S/38 architecture, and many of the high level details (if not the specific implementation) also apply to the AS/400 and IBM i: https://homes.cs.washington.edu/~levy/capabook/Chapter8.pdf

    • By RachelF 2026-01-052:241 reply

      > iAPX 432 Yes, this was a failure, the Itanium of the 1980's

      I also regard ADA as a failure. I worked with it many years ago. ADA would take 30 minutes to compile a program. Turbo C++ compiled equivalent code in a few seconds.

      • By musicale 2026-01-056:081 reply

        Machines are thousands of times faster now, yet C++ compilation is still slow somehow (templates? optimization? disinterest in compiler/linker performance? who knows?) Saving grace is having tons of cores and memory for parallel builds. Linking is still slow, though.

        Of course Pascal compilers (Turbo Pascal etc.) could be blazingly fast since Pascal was designed to be compiled in a single pass, but presumably linking was faster as well. I wonder how Delphi or current Pascal compilers compare? (Pascal also supports bounded strings and array bounds checks IIRC.)

        • By magicalhippo 2026-01-0512:10

          > I wonder how Delphi or current Pascal compilers compare?

          Just did a full build of our main Delphi application on my work laptop, sporting an Intel i7-1260P. It compiled and linked just shy of 1.9 million lines of code in 31 seconds. So, still quite fast.

  • By tliltocatl 2026-01-0419:322 reply

    > Show me somebody who calls the IBM S/360 a RISC design, and I will show you somebody who works with the s390 instruction set today.

    Ahaha so true.

    But to answer the post's main question:

    > Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

    Because backwards compatibility is more valuable than elegant designs. Because array-crunching performance is more important than safety. Because a fix for a V8 vulnerability can be quickly deployed while a hardware vulnerability fix cannot. Because you can express any object model on top of flat memory, but expressing one object model (or flat memory) in terms of another object model usually costs a lot. Because nobody ever agreed of what the object model should be. But most importantly: because "memory safety" is not worth the costs.

    • By nine_k 2026-01-0420:183 reply

      But we don't have a linear address space, unless you're working with a tiny MCU. For last like 30 years we have virtual address space on every mainstream processor, and we can mix and match pages the way we want, insulate processes from one another, add sentinel pages at the ends of large structures to generate a fault, etc. We just structure process heaps as linear memory, but this is not a hard requirement, even on current hardware.

      What we lack is the granularity that something like iAPX432 envisioned. Maybe some hardware breakthrough would allow for such granularity cheaply enough (like it allowed for signed pointers, for instance), so that smart compilers and OSes would offer even more protection without the expense of switching to kernel mode too often. I wonder what research exists in this field.

      • By imtringued 2026-01-0514:041 reply

        This feels like a pointless form of pendantry.

        Okay, so we went from linear address spaces to partioned/disaggregated linear address spaces. This is hardly the victory you claim it is, because page sizes are increasing and thus the minimum addressable block of memory keeps increasing. Within a page everything is linear as usual.

        The reason why linear address spaces are everywhere has to do with the fact that they are extremely cost effective and fast to implement in hardware. You can do prefix matching to check if an address is pointing at a specific hardware device and you can use multiplexers to address memory. Addresses can easily be encoded inside a single std_ulogic_vector. It's also possible to implement a Network-on-Chip architecture for your on-chip interconnect. It also makes caching easier, since you can translate the address into a cache entry.

        When you add a scan chain to your flip flops, you're implicitly ordering your flip flops and thereby building an implicit linear address space.

        There is also the fact that databases with auto incrementing integers as their primary keys use a logical linear address space, so the most obvious way to obtain a non-linear address space would require you to use randomly generated IDs instead. It seems like a huge amount of effort would have to be spent to get away from the idea of linear address spaces.

        • By nine_k 2026-01-0518:201 reply

          We have a linear address space where we can map physical RAM and memory-mapped devices dynamically. Every core, at any given time, may have its own view of it. The current approach uses pretty coarse granularity, separating execution at the process level. The separation could be more granular.

          The problem is the granularity of trust within the system. Were the MMU much faster, and TLB much larger (say, 128MiB of dedicated SRAM), the granularity might be pretty high, giving each function's stack a separate address space insulated from the rest of RAM. This is possible even now, just would be impractically slow.

          Any hierarchical (tree-based) addressing scheme is equivalent to a linear addressing scheme, pick any tree traversal algorithm. Any locally-hierarchical addressing scheme seemingly can be implemented with (short) offsets in a linear address space; this is how most jumps in x64 and aarch64 are encoded, for instance.

          • By latchup 2026-01-0611:51

            It has very little to do with trust and a lot to do with the realities of hardware implementation. Every interconnect has a minimum linear transfer granularity that properly utilizes its hardware links, dictated primarily by its physical link width and minimum efficient burst length. The larger this minimum granularity, the faster and more efficient moving data becomes. However, below this granularity, bandwidth and energy efficiency crater. Hence, reducing access granularity below this limit has disastrous consequences.

            In fact, virtual memory is already a limiting factor to increasing minimum transfer size, as pages must be an efficient unit of exchange. Traditional 4 KiB pages are already smaller than what would be a good minimum transfer size; this is exactly why hardware designers push for larger pages (with Apple silicon forgoing 4 KiB page support entirely).

            I cannot help but feel that many of these discussions are led astray by the misconceptions of people with an insufficient understanding of modern computer architecture.

      • By convolvatron 2026-01-0420:245 reply

        its entirely possible to implement segments on top of paging. what you need to do is add the kernel abstractions for implementing call gates that change segment visibility, and write some infrastructure to manage unions-of-a-bunch-of-little-regions. I haven't implemented this myself, but a friend did on a project we were working on together and as a mechanism it works perfectly well.

        getting userspace to do the right thing without upending everything is what killed that project

        • By Joker_vD 2026-01-0420:30

          There is also a problem of nested virtualization. If the VM has its own "imaginary" page tables on top of the hypervisor's page tables, then the number of actual physical memory reads goes from 4–6 to 16–36.

        • By aforwardslash 2026-01-055:261 reply

          If I understood correctly, you'te talking about using descriptors to map segments; the issue with this approach is two-fold: it is slow (as each descriptor needs to be created for each segment - and sometimes more than one, if you need write-execute permissions), and there is a practical limit on the number of descriptors you can have - 8192 total, including call gates and whatnot. To extend this, you need to use LDTs, that - again - also require a descriptor in the GDT and are limited to 8192 entries. In a modern desktop system, 67 million segments would be both quite slow and at the same time quite limited.

          • By convolvatron 2026-01-0519:491 reply

            no, not at all. we weren't using the underlying segmentation support. we just added kernel facilities to support segment ids and ranges and augment the kernel region structure appropriately. A call gate is just a syscall that changes the processes VM tables to include or drop regions (segments) based on the policy of the call.

            • By aforwardslash 2026-01-0716:23

              Humm very interesting approach, are there any publicly available documentation links?

        • By nine_k 2026-01-0420:32

          Indeed. Also, TLB as it exists on x64 is not free, nor is very large. A multi-level "TLB", such that a process might pick an upper level of a large stretch of lower-level pages and e.g. allocate a disjoint micro-page for each stack frame, would be cool. But it takes a rather different CPU design.

        • By tliltocatl 2026-01-0422:461 reply

          But that wouldn't protect against out-of boundary access (which is the whole point of segments), would it?

          • By convolvatron 2026-01-0519:491 reply

            thats enforced by the VM hardware - we just shuffle the PTEs around to match the appropriate segment view

            • By rep_lodsb 2026-01-069:001 reply

              As long as it's a linear address space, adding/subtracting a large enough value to a pointer (array, stack) could still cross into another "segment".

              • By convolvatron 2026-01-0614:06

                but those wouldn't be mapped unless you have crossed a call gate that enabled them. the kernel call gate implementation changes the VM map (region visibility) accordingly

        • By formerly_proven 2026-01-0420:441 reply

          "Please give me a segmented memory model on top of paged memory" - words which have never been uttered

          • By nine_k 2026-01-0421:031 reply

            There is a subtle difference between "give me an option" and "thrust on me a requirement".

      • By inkyoto 2026-01-050:481 reply

        > But we don't have a linear address space, unless you're working with a tiny MCU.

        We actually do, albeit for a brief duration of time – upon a cold start of the system when the MCU is inactive yet, no address translation is performed, and the entire memory space is treated as a single linear, contiguous block (even if there are physical holes in it).

        When a system is powered on, the CPU runs in the privileged mode to allow an operating system kernel to set up the MCU and activate it, which takes place early on in the boot sequence. But until then, virtual memory is not available.

        • By loeg 2026-01-050:581 reply

          Those holes can be arbitrarily large, though, especially in weirder environments (e.g., memory-mapped optane and similar). Linear address space implies some degree of contiguity, I think.

          • By inkyoto 2026-01-051:161 reply

            Indeed. It can get ever weirder in the embedded world where a ROM, an E(E)PROM or a device may get mapped into an arbitrary slice of physical address space, anywhere within its bounds. It has become less common, though.

            But devices are still commonly mapped at the top of the physical address space, which is a rather widespread practice.

            • By crote 2026-01-053:34

              And it's not uncommon for devices to be mapped multiple times in the address space! The different aliases provide slightly different ways of accessing it.

              For example, 0x000-0x0ff providing linear access to memory bank A, 0x100-0x1ff linear access to bank B, but 0x200-0x3ff providing striped access across the two banks, with evenly-addressed words coming from bank A but odd ones from bank B.

              Similarly, 0x000-0x0ff accessing memory through a cache, but 0x100-0x1ff accessing the same memory directly. Or 0x000-0x0ff overwriting data, 0x100-0x1ff setting bits (OR with current content), and 0x200-0x2ff clearing bits.

    • By bschmidt25014 2026-01-0421:25

      [dead]

  • By sph 2026-01-058:082 reply

    > Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?

    What a weird question, conflating one thing with the other.

    I’m working on a object capability system, and trying hard to see if I can make it work using a linear address space so I don’t have to waste two or three pages per “process” [1][2] I really don’t see how objects have anything to do with virtual memory and memory isolation, as they are a higher abstraction. These objects have to live somewhere, unless the author is proposing a system without the classical model of addressable RAM.

    —-

    1: the reason I prefer a linear address space is that I want to run millions of actors/capabilities on a machine, and the latency and memory usage of switching address space and registers become really onerous. Also, I am really curious to see how ridiculously fast modern CPUs are when you’re not thrashing the TLB every millisecond or so.

    2: in my case I let system processes/capabilities written in C run in linear address space where security isn’t a concern, and user space in a RISC-V VM so they can’t escape. The dream is that CHERI actually goes into production and user space can run on hardware, but that’s a big if.

    The memory management story is still a big question: how do you do allocations in a linear address space? If you give out pages, there’s a lot of wastage. The alternative is a global memory allocator, which I am really not keen on. Still figuring out as I go.

    • By antonvs 2026-01-0520:56

      > What a weird question, conflating one thing with the other.

      I can only imagine he means something different by “object-oriented” than the concept at the programming language level. And if he is referring to that, then I hope no-one ever lets him near anything resembling hardware design.

    • By jdougan 2026-01-0513:071 reply

      Have you looked at the Apple Newton memory architecture?

      http://waltersmith.us/newton/HICSS-92.pdf

      • By sph 2026-01-0614:24

        Thanks, will do

HackerNews