Hacker News

Cursed Linear Types in Rust

2024-11-2710:1410622geo-ant.github.io

Inspired by Jack Wrenn’s post on Undroppable Types in Rust, I set out to see if it’s possible to create types that must be used exactly once. From my understanding, those things are called linear…

Show article

Let’s see if we can create a struct UseOnce<T> which enforces that an instance is used (or consumed) exactly once. It should be impossible to consume it more than once, and it should produce a compile error if it’s not consumed at all. The first part is trivial with destructive move semantics, the second part is where we ~~steal~~ adapt Jack’s original idea.

Implementation

use core::mem::ManuallyDrop;
use core::mem::MaybeUninit; pub struct UseOnce<T>(MaybeUninit<T>); impl<T> UseOnce<T> { pub fn new(val: T) -> Self { Self(MaybeUninit::new(val)) } pub fn consume<F, R>(self, f: F) -> R where F: FnOnce(T) -> R, { // (1) let mut this = ManuallyDrop::new(self); // (2) let mut val = MaybeUninit::uninit(); std::mem::swap(&mut this.0, &mut val); unsafe { let val = val.assume_init(); f(val) } }
} impl<T> Drop for UseOnce<T> { fn drop(&mut self) { const { panic!("UseOnce instance must be consumed!") } }
} fn main() { let instance = UseOnce::new(41); // (3) // comment out this line to get a compile error let _result = instance.consume(|v| v + 1);
}

Playground Link. Again, the clever part is Jack Wrenn’s original idea. I was also surprised this works. To my understanding, it relies on the fact that the compiler can reason that the drop implementation does not have to be generated when consume is called due to ①. There’s some additional unsafe trickery in ②, which is not terribly important but it’s actually safe. It allows me to use MaybeUninit<T> instead of Option<T> as the inner type so that there’s no space penalty as there could be if I had used an Option.

As is, the code compiles just fine, but if we comment out the consume below ③, it will fail with a compile error like so:

error[E0080]: evaluation of `<UseOnce<i32> as std::ops::Drop>::drop::{constant#0}` failed
  --> src/main.rs:27:9
   |
27 |         panic!("UseOnce instance must be consumed!")
   |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'UseOnce instance must be consumed!', src/main.rs:27:9
   |
   = note: this error originates in the macro `$crate::panic::panic_2021` which comes from the expansion of the macro `panic` (in Nightly builds, run with -Z macro-backtrace for more info)

note: erroneous constant encountered
  --> src/main.rs:26:9
   |
26 | /         const {
27 | |         panic!("UseOnce instance must be consumed!")
28 | |         }
   | |_________^

note: the above error was encountered while instantiating `fn <UseOnce<i32> as std::ops::Drop>::drop`
   --> /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:574:1
    |
574 | pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0080`.

Not exactly pretty but it does the trick.

Why It’s Cursed

Unfortunately, the UseOnce<T> is not as useful or powerful as it might seem at first sight. Firstly, since the compiler error is enforced by the Drop implementation, we can just mem::forget the instance and not actually consume it. I don’t feel this is a giant problem because it’s still very explicit and arguably counts as a sort of consumption. But it’s worth noting.

Secondly, the API allows us to “exfiltrate” the inner value of the UseOnce<T> instance by just calling consume with the identity function. That’s a consequence of providing an API that accepts functions with non-unit return values. I also don’t consider this much of a problem, because we can argue that we want the UseOnce<T> instance itself to be consumed exactly once, not necessary the inner value. However, reasonable people may disagree.

Thirdly, as was pointed out by u/SkiFire13 in the reddit thread, this trick relies on the compiler’s ability to reason without optimizations that the type will not be dropped. Thus, simply sticking a function call between the creation and consumption of the instance will make this code fail²:

fn foo() {} fn main() { let instance = UseOnce::new(41); foo(); let _result = instance.consume(|v| v + 1);
}

This code does not compile despite the value being consumed. You can see how this severely limits the applicability of UseOnce. There is an even more cursed remedy for that, which is using the idea of the prevent_drop crate. In that crate, a non-existing external function is linked in the Drop implementation, which moves the error to link time. That will make it work for this case but it also makes the error even uglier³.

Endnotes

If you like my work and want to support it, please share it. If you want to do even more than that, consider buying me a coffee ☕.

Read the original article

todsacerdoti

Karma: 214285

@Hacker__News
@hacker._news

Comments

By MrMcCall 2024-12-0119:177 reply

The problem is that programming languages have always focused on the definition side of types, which is absolutely necessary and good, but the problem is that only limiting use by, e.g., "protected, private, friend, internal, ..." on class members, as well as the complicated ways we can limit inheritance, are barely useful.

We need a way to define how to "use" the types we define. That definitional structure is going to bleed from creation of instances into how they live out their lifetimes. It appears that Rust's design addresses some aspects of this dimension, and it also appears to be a fairly major point of contention among y'all, or at least require a steepish learning curve. I don't know, as I prefer to work in ubiquitous environments that are already featureful on 5-10yo distros.

One usage pattern many of us have found useful in our software for years is the "set once and only once" (singleton-ish) , whether it's for a static class member or a static function var, or even a db table's row(s). I don't know of any programming environment that facilitates properly specifying calculating something even that basic in the init phase of running the system, but I don't explore new languages so much anymore, none of them being mature enough to rely upon. Zig's comptime stuff looks promising, but I'm not ready to jump onto that boat just yet. I am, however, open to suggestions.

The real solution will ultimately require a more "wholistic" (malapropism intended) approach to constraining all dimensions of our software systems while we are building them out.

By pornel 2024-12-0119:39

Rust's exclusive ownership can be used for things that can be called at most once (a method can require a non-copyable object as an argument, and takes ownership of it away from you, so you can't call the method again).

Rust also has wrapper types like OnceCell that have `get_or_init()` which will init the thing only once, and then it's guaranteed to be immutable. Unlike singletons, these don't have to be globally accessible.

By PittleyDunkin 2024-12-020:161 reply

> I don't know of any programming environment that facilitates properly specifying calculating something even that basic in the init phase of running the system,

The JVM has well-defined class loading semantics, including class initialization, that allow limited initialization capabilities before main is even run. Of course it has other problems too (defining the order in which these fire can be frustrating) but it always struck me as straightforward to work with.

By MrMcCall 2024-12-029:331 reply

Order is the soul of dataflow integration systems, which is programming. And this is because order is the sole purpose of the universe, itself, as it spreads change across time and space. And there are most definitely rules to what changes and how.

As above : so below :: the universe : our datablow integration systems. The challenge is to make ours perfectly clockwork, too.

By porridgeraisin 2024-12-0219:591 reply

What are you talking about man

By MrMcCall 2024-12-0221:51

If you can't know the order of your static class init routines' execution, you can't create some kinds of software systems, where such information must be known beforehand. Failing that we have to contrive our own init workflow launched from a single point of entry, instead of using something intrinsic for the guarantees we need. What that means is that this advanced feature of the programming environment should be avoided in certain cases, making it useless and (IMO) to be avoided.

I was asking about programming environments that facilitate certain kinds of clean init semantics.

Like a clock. Or something that is operationally clocklike, deterministic, like we should design and implement our software to be in terms of reliability.

By sunshowers 2024-12-0123:51

"Set once and only once" is achieved well via Rust's OnceLock [1], and it's a pattern I use quite heavily in my Rust code. Especially because OnceLock only requires a shared reference and not a mutable one. It's a really good fit for cached results computed on-demand, scoped to whatever level is reasonable (individual type, thread, or whole process).

[1] https://doc.rust-lang.org/beta/std/sync/struct.OnceLock.html

By chipdart 2024-12-0122:24

> The problem is that programming languages have always focused on the definition side of types, which is absolutely necessary and good, but the problem is that only limiting use by, e.g., "protected, private, friend, internal, ..." on class members, as well as the complicated ways we can limit inheritance, are barely useful.

Virtually all software ever developed managed just fine to with that alone.

> I don't know of any programming environment that facilitates properly specifying calculating something even that basic in the init phase of running the system, (...)

I don't know what I'm missing, but it sounds like you're describing the constructor of a static object whose class only provides const/getter methods.

> or even a db table's row(s).

I don't think you're describing programming language constructs. This sounds like a framework feature that can be implemented with basic inversion of control.

By logicchains 2024-12-0119:273 reply

>One usage pattern many of us have found useful in our software for years is the "set once and only once" (singleton-ish)

C# has this: https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...

By chipdart 2024-12-0122:321 reply

> C# has this:

This is only syntactic sugar to allow using object initializers to initialize specific member varabiles of a class instance instead of simply using a constructor and/or setting member variables in follow-up statements. It's hardly the feature OP was describing.

By neonsunset 2024-12-0122:551 reply

While mutation of init-only properties is sometimes done by e.g. serializers through private reflection or unsafe accessors, it otherwise can lead to unsound behavior if class implementation does not expect this. You cannot bypass this through normal language means.

Same applies to readonly instance fields.

Where does "syntax sugar" end and "true features" begin?

By bee_rider 2024-12-0123:111 reply

Anyway, trying to actually prevent a program from modifying its own memory is really hopeless, right? So any promises beyond “syntactic sugar” would be resting on a poor foundation, perhaps even dangerously misleading.

By neonsunset 2024-12-0123:19

You can always mmap a memory range, place a struct there, then mprotect it. And on the language side you cannot overwrite readonly structs observed by readonly refs (aside from unsafe, which will trigger segfault in this case).

There are ways to do it. What matters is the user ergonomics, otherwise, by this logic, most higher-level languages would have even less of a claim to immutability, and yet somehow it's not an issue?

If there are exotic requirements - there are exotic tools for it. FWIW static readonlys are blocked from modification even through reflection and modifying their memory with unsafe is huge UB.

By MrMcCall 2024-12-0120:40

That's new since I've been tramping around VS.NET (F# 2 or 3 & C# from 10ya).

My immediate purposes require that I avoid depending on unique programming language/environment constructions, but it helps to learn, so thanks for levelling me up.

By cempaka 2024-12-0120:202 reply

And Java will soon have StableValue: https://openjdk.org/jeps/8312611

By neonsunset 2024-12-0121:291 reply

Reading the JEP, isn’t StableValue about JIT constants?

In .NET it’s just 'static readonly'.

By cempaka 2024-12-0123:28

The basic feature set is at-most-once thread-safe initialization of values, which can then enjoy the same constant folding and other JIT optimizations one would get from `static final`, but can be initialized at any point during the program's run.

By mrkeen 2024-12-0121:471 reply

  Non-goals
  It is not a goal to provide Java language support for declaring stable values.

Hmmm

By jcrites 2024-12-0122:131 reply

From context, I would infer that this means they are not changing the Java language itself. It’s a feature expressed through the existing language, like as a library feature. I could be wrong though.

By cempaka 2024-12-0123:29

Yes you're correct, the idea is that they're not adding any keywords or bytecodes, just some new standard lib APIs that will get some special treatment by the JVM.

By cardanome 2024-12-0123:33

The readonly property in PHP would fit the bill quite well as it can be set once and only once, no?

Plus the new PHP 8.4 version actually has asymmetric visibility for properties so you can have public properties that can not be mutated from the outside but still allow controlled mutation on the inside. The feature was borrowed from swift. I am super excited about it.

https://wiki.php.net/rfc/asymmetric-visibility-v2

By sali0 2024-12-024:45

I am not sure if this the same, but this reminds me of the Abilities system on the Move language[0]. It allows you to create linear, affine or other type constraints in a pretty flexible way.

[0] https://move-book.com/move-basics/abilities-introduction.htm...