(a blog post based on a market based on a blog post based on an earlier blog post)
Here’s our story so far:
Markets are a good way to know what people really think. When India and Pakistan started firing missiles at each other on May 7, I was concerned, what with them both having nuclear weapons. But then I looked at world market prices:
See how it crashes on May 7? Me neither. I found that reassuring.
But we care about lots of stuff that isn’t always reflected in stock prices, e.g. the outcomes of elections or drug trials. So why not create markets for those, too? If you create contracts that pay out $1 only if some drug trial succeeds, then the prices will reflect what people “really” think.
In fact, why don’t we use markets to make decisions? Say you’ve invented two new drugs, but only have enough money to run one trial. Why don’t you create markets for both drugs, then run the trial on the drug that gets a higher price? Contracts for the “winning” drug are resolved based on the trial, while contracts in the other market are cancelled so everyone gets their money back. That’s the idea of Futarchy, which Robin Hanson proposed in 2007.
Why don’t we? Well, maybe it won’t work. In 2022, I wrote a post arguing that when you cancel one of the markets, you screw up the incentives for how people should bid, meaning prices won’t reflect the causal impact of different choices. I suggested prices reflect “correlation” rather than causation, for basically the same reason this happens with observational statistics. This post, it was magnificent.
It didn’t convince anyone.
Years went by. I spent a lot of time reading Bourdieu and worrying about why I buy certain kinds of beer. Gradually I discovered that essentially the same point about futarchy had been made earlier by, e.g., Anders_H in 2015, abramdemski in 2017, and Luzka in 2021.
In early 2025, I went to a conference and got into a bunch of (friendly) debates about this. I was astonished to find that verbally repeating the arguments from my post did not convince anyone. I even immodestly asked one person to read my post on the spot. (Bloggers: Do not do that.) That sort of worked.
So, I decided to try again. I wrote another post called ”Futarky’s Futarchy’s fundamental flaw”. It made the same argument with more aggression, with clearer examples, and with a new impossibility theorem that showed there doesn’t even exist any alternate payout function that would incentivize people to bid according to their causal beliefs.
That post… also didn’t convince anyone. In the discussion on LessWrong, many of my comments are upvoted for quality but downvoted for accuracy, which I think means, “nice try champ; have a head pat; nah.” Robin Hanson wrote a response, albeit without outward evidence of reading beyond the first paragraph. Even the people who agreed with me often seemed to interpret me as arguing that futarchy satisfies evidential decision theory rather than causal decision theory. Which was weird, given that I never mentioned either of those, don’t accept the premise the futarchy satisfies either of them, and don’t find the distinction helpful in this context.
In my darkest moments, I started to wonder if I might fail to achieve worldwide consensus that futarchy doesn’t estimate causal effects. I figured I’d wait a few years and then launch another salvo.
But then, legendary human Bolton Bailey decided to stop theorizing and take one of my thought experiments and turn it into an actual experiment. Thus, Futarchy’s fundamental flaw — the market was born. (You are now reading a blog post about that market.)
I gave a thought experiment where there are two coins and the market is trying to pick the one that’s more likely to land heads. For one coin, the bias is known, while for the other coin there’s uncertainty. I claimed futarchy would select the worse / wrong coin, due to this extra uncertainty.
Bolton formalized this as follows:
There are two markets, one for coin A and one for coin B.
Coin A is a normal coin that lands heads 60% of the time.
Coin B is a trick coin that either always lands heads or always lands tails, we just don’t know which. There’s a 59% it’s an always-heads coin.
Twenty-four hours before markets close, the true nature of coin B is revealed.
After the markets closes, whichever coin has a higher price is flipped and contracts pay out $1 for heads and $0 for tails. The other market is cancelled so everyone gets their money back.
Get that? Everyone knows that there’s a 60% chance coin A will land heads and a 59% chance coin B will land heads. But for coin A, that represents true “aleatoric” uncertainty, while for coin B that represents “epistemic” uncertainty due to a lack of knowledge. (See Bayes is not a phase for more on “aleatoric” vs. “epistemic” uncertainty.)
Bolton created that market independently. At the time, we’d never communicated about this or anything else. To this day, I have no idea what he thinks about my argument or what he expected to happen.
In the forum for the market, there was a lot of debate about “whalebait”. Here’s the concern: Say you’ve bought a lot of contracts for coin B, but it emerges that coin B is always-tails. If you have a lot of money, then you might go in at the last second and buy a ton of contracts on coin A to try to force the market price above coin B, so the coin B market is cancelled and you get your money back.
The conversation seemed to converge towards the idea that this was whalebait. Though notice that if you’re buying contracts for coin A at any price above $0.60, you’re basically giving away free money. It could still work, but it’s dangerous and everyone else has an incentive to stop you. If I was betting in this market, I’d think that this was at least unlikely.
Bolton posted about the market. When I first saw the rules, I thought it wasn’t a valid test of my theory and wasted a huge amount of Bolton’s time trying to propose other experiments that would “fix” it. Bolton was very patient, but I eventually realized that it was completely fine and there was nothing to fix.
At the time, this is what the prices looked like:

That is, at the time, both coins were priced at $0.60, which is not what I had predicted. Nevertheless, I publicly agreed that this was a valid test of my claims.
I think this is a great test and look forward to seeing the results.
Let me reiterate why I thought the markets were wrong and coin B deserved a higher price. There’s a 59% chance coin B would turns out to be all-heads. If that happened, then (absent whales being baited) I thought the coin B market would activate, so contracts are worth $1. So thats 59% × $1 = $0.59 of value. But if coin B turns out to be all-tails, I thought there is a good chance prices for coin B would drop below coin A, so the market is cancelled and you get your money back. So I thought a contract had to be worth more than $0.59.
If you buy a contract for coin B for $0.70, then I think that’s worth
P[all-heads] × P[market activates | all-heads] × $1
+ P[all-tails] × P[market cancelled | all-tails] × $0.70
= 0.59 × $1
+ 0.41 × P[market cancelled | all-tails] × $0.70
= $0.59
+ $0.287 × P[market cancelled | all-tails]
Surely P[market cancelled | all-tails] isn’t that low. So surely this is worth more than $0.59.
More generally, say you buy a YES contract for coin B for $M. Then that contract would be worth
P[all-heads] × $1 × P[market activates | all-heads]
+ P[all-tails] × $M × P[market cancelled | all-tails]
= $0.59
+ 0.41 × $M × P[market cancelled | all-tails]
It’s not hard to show that the breakeven price is
M = $0.59 / (1 - 0.41 × P[market cancelled | all-tails]).
Even if you thought P[market cancelled | all-tails] was only 50%, then the breakeven price would still be $0.7421.
Within a few hours, a few people bought contracts on coin B, driving up the price.

Then, Quroe proposed creating derivative markets.
In theory, if there was a market asking if coin A was going to resolve YES, NO, or N/A, supposedly people could arbitrage their bets accordingly and make this market calibrated.
Same for a similar market on coin B.
Thus, Futarchy’s Fundamental Fix - Coin A and Futarchy’s Fundamental Fix - Coin B came to be. These were markets in which people could bid on the probability that each coin would resolve YES, meaning the coin was flipped and landed heads, NO, meaning the coin was flipped and landed tails, or N/A, meaning the market was cancelled.
Honestly, I didn’t understand this. I saw no reason that these derivative markets would make people bid their true beliefs. If they did, then my whole theory that markets reflect correlation rather than causation would be invalidated.
Prices for coin B went up and down, but mostly up.

Eventually, a few people created large limit orders, which caused things to stabilize.
Here was the derivative market for coin A.

I was right. Everyone knew coin A had a higher chance of being heads than coin B, but everyone bid the price of coin B way above coin A anyway.
In the previous math box, we saw that the breakeven price should satisfy
M = $0.59 / (1 - 0.41 × P[market cancelled | all-tails]).
If you invert this and plug in M=$0.90, then you get
P[market cancelled | all-tails]
= (1 - $0.59 / M) / 0.41
= (1 - $0.59 / $0.9) / 0.41
= 84.01%
I’ll now open the floor for questions.
Isn’t this market unrealistic?
Yes, but that’s kind of the point. I created the thought experiment because I wanted to make the problem maximally obvious, because it’s subtle and everyone is determined to deny that it exists.
Isn’t this just a weird probability thing? Why does this show futarchy is flawed?
The fact that this is possible is concerning. If this can happen, then futarchy does not work in general. If you want to claim that futarchy works, then you need to spell out exactly what extra assumptions you’re adding to guarantee that this kind of thing won’t happen.
But prices did reflect causality when the market closed! Doesn’t that mean this isn’t a valid test?
No. That’s just a quirk of the implementation. You can easily create situations that would have the same issue all the way through market close. Here’s one way you could do that:
On average, this market will run for 30 days. (The length follows a geometric distribution). Half the time, the market will close without the nature of coin B being revealed. Even when that happens, I claim the price for coin B will still be above coin A.
If futarchy is flawed, shouldn’t you be able to show that without this weird step of “revealing” coin B?
Yes. You should be able to do that, and I think you can. Here’s one way:
10100011001100000001. Let coin B be heads with probability (49+N)% where N is the number of 1 bits. do not reveal these bits publicly.First, have users generate public keys by running this command:
openssl genrsa 1024 > private.pem
openssl rsa -in private.pem -pubout > public.pem
Second, they should post the contents of the public_key.pem when asking for their bit. For example:
Hi, can you please send me a bit? Here's my public key:
-----BEGIN PUBLIC KEY-----
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDOlesWS+mnvHJOD2osUkbrxE+Y
PMqAUYqwemOwML0LlWLq5RobZRSeyssQhg0i3g2GsMZFMsvjindz6mxccdyP4M8N
mQVCK1Ovs1Z4+DxwmLf/y8vaGC3vfZBOhJDdaNdpRyUiQFaBW99We4cafVnmirRN
Py2lRe+CFgP3kSp4dQIDAQAB
-----END PUBLIC KEY-----
Third, whoever is running the market should save that key as public.pem, pick a pit, and encrypt it like this:
% echo "your secret bit is 1" | openssl pkeyutl -encrypt -pubin -inkey public.pem | base64
OuHt25Jwc1xYq63Ub8gOLKaZEJwwGHWDL0UGfydmvBapQNKf3l6Akol2Z2XHtCAC8G/lPJsCjb1dN878tU0aCMjbO5EvpMUTuohb0OczaCqAMld8uFL+j+uEZsIjKFT3Q52VumdVqMntJYG6Br6QeUs1vAL2HA6Nvych+Ao2e8M=
Users can then decrypt like this:
% echo "OuHt25Jwc1xYq63Ub8gOLKaZEJwwGHWDL0UGfydmvBapQNKf3l6Akol2Z2XHtCAC8G/lPJsCjb1dN878tU0aCMjbO5EvpMUTuohb0OczaCqAMld8uFL+j+uEZsIjKFT3Q52VumdVqMntJYG6Br6QeUs1vAL2HA6Nvych+Ao2e8M=" | base64 -d | openssl pkeyutl -decrypt -inkey private.pem
your secret bit is 1
Or you could use email…
I think this market captures a dynamic that’s present in basically any use of futarchy: You have some information, but you know other information is out there.
I claim that this market—will be weird. Say it just opened. If you didn’t get a bit, then as far as you know, the bias for coin B could be anywhere between 49% and 69%, with a mean of 59%. If you did get a bit, then it turns out that the posterior mean is 58.5% if you got a 0 and 59.5% if you got a 1. So either way, your best guess is very close to 59%.
However, the information for the true bias of coin B is out there! Surely coin B is more likely to end up with a higher price in situations where there are lots of 1 bits. This means you should bid at least a little higher than your true belief, for the same reason as the main experiment—the market activating is correlated with the true bias of coin B.
Of course, after the markets open, people will see each other’s bids and… something will happen. Initially, I think prices will be strongly biased for the above reasons. But as you get closer to market close, there’s less time for information to spread. If you are the last person to trade, and you know you’re the last person to trade, then you should do so based on your true beliefs.
Except, everyone knows that there’s less time for information to spread. So while you are waiting till the last minute to reveal your true beliefs, everyone else will do the same thing. So maybe people sort of rush in at the last second? (It would be easier to think about this if implemented with batched auctions rather than a real-time market.)
Anyway, while the game theory is vexing, I think there’s a mix of (1) people bidding higher than their true beliefs due to correlations between the final price and the true bias of coin B and (2) people “racing” to make the final bid before the markets close. Both of these seem in conflict with the idea of prediction markets making people share information and measuring collective beliefs.
Why do you hate futarchy?
I like futarchy. I think society doesn’t make decisions very well, and I think we should give much more attention to new ideas like futarchy that might help us do better. I just think we should be aware of its imperfections and consider variants (e.g. commiting to randomization) that would resolve them.
If I claim futarchy does reflect causal effects, and I reject this experiment as invalid, should I specify what restrictions I want to place on “valid” experiments (and thus make explicit the assumptions under which I claim futarchy works) since otherwise my claims are unfalsifiable?
Possibly?