
Today Veritasium published a video about Newcomb's paradox: you walk into a room with one transparent box containing $1000, and one opaque box. You're allowed to either take both boxes or just take…
Today Veritasium published a video about Newcomb's paradox: you walk into a room with one transparent box containing $1000, and one opaque box. You're allowed to either take both boxes or just take the opaque box. Before you walk in, a supercomputer predicted which choice you'd make, and put $1000000 in the opaque box if it predicted you'd take just the one, or $0 if it predicted you'd take both. Thousands of people have gone through the room, and the computer has always been right. What should you do?
I'm a fan of this paradox and I like Veritasium. But the way they describe the setup is wrong:
Don't worry about how the supercomputer is making its prediction. Instead of a computer, you could think of it as a superintelligent alien, a cunning demon, or even a team of the world's best psychologists. It really doesn't matter who or what is making the prediction.
It actually matters a lot.
If the predictor is Laplace's demon and genuinely knows the location and momentum of every particle in the universe, then sure, I buy that it can accurately predict what I'll choose.
If the predictor is a computer or a team of psychologists, they can still predict a lot. The main discussion in the video is about how the predictor knows what thought process you're gonna use to decide whether to one-box or two-box, which I totally agree that a computer or psychologist can predict by knowing about your history and personality.
But come on. You know some smartasses are gonna walk in there and flip a coin. No supercomputer on Earth can predict that consistently.
So, my complaint about the Veritasium video is that they present it as if it doesn't matter whether there's anything supernatural going on.
Technically, I didn't present the problem in exactly the same way that Veritasium did. They only said that the computer has almost always been correct, not that it has always been correct. And that of course also completely changes the problem: if the computer has made mistakes in the past, then it's possible for it to make mistakes in the future.
In the video, Gregor gives an argument for one-boxing based on probabilities. He starts by assuming that the probability the computer guesses your answer correctly is CCC, and then derives that you should one-box as long as C>0.5005C > 0.5005C>0.5005. But this is also flawed. There's a hidden assumption that CCC is independent of whether you decide to one-box or two-box. But there's no a priori reason to believe that. If the computer has been wrong before, it'd be very surprising for it to be wrong just as often for one-boxers and two-boxers. And so there's no reason to expect that the computer would continue to be accurate if you try to trick it via some clever decision-making process that other people rarely use.
Anyways, I think that the problem is far less interesting if the predictor has been wrong before. Typically, the predictor is presented as having never been wrong, which makes the problem actually interesting from a decision-theoretic standpoint.
Obviously I'm not the first one to think of this. Before I wrote this post, I Googled "newcomb's paradox flip a coin" and found these results just on the front page:
In Nozick's original 1969 paper, the predictor has never been wrong before, and also has one additional twist to its behavior: if it predicts that "you will consciously randomize your choice," then it puts $0 in the opaque box. I guess that's one possible answer to my complaint, but in my opinion it just pushes the problem back further: what exactly constitutes "consciously randomizing your choice"? Flipping a coin isn't truly random, it's just chaotic. Are not my typical brain processes also chaotic? Where do we draw the line?
A 2010 paper titled "A Study of Quantum Strategies for Newcomb's Paradox". I haven't read the whole paper, but it's a lot more rigorous than what I've laid out here, exploring the idea of not just flipping a coin, but producing genuine randomness using some quantum shenanigans.
A 2021 Hacker News thread in which someone proposes flipping a coin and another person responds with an interesting connection to the halting problem.
A 2023 Medium comment asserting that the setup is invalid because an AI cannot predict a coin flip.
A 2023 blog post in which the amount of money in the opaque box is reduced so that a random strategy actually gives you a higher expected value than just always picking the opaque box by itself.
Such a fun paradox! I suspect that these arguments will continue indefinitely. :)
I've been presented with this thought experiment before and I always feel like I'm missing something when other people talk about it. Why would you ever take both boxes?
The premise is that the predictor is always right. So whether you take one or both boxes, the predictor would have predicted that choice. We know from the setup that if the predictor said you would take the one box, it will have a million dollars. Therefore, if you take the one box it will have a million dollars in it (because whatever you choose is what the predictor predicted).
As an aside, I think whatever this says about free will or if you're actually making a "choice" is irrelevant in regards to if the million dollars is in the box. The way I see both choices is this:
You "decide" to take both boxes -> the perfect predictor predicted this -> the opaque box has zero dollars -> you get a thousand dollars
You "decide" to take the opaque (one) box -> the perfect predictor predicted this -> the opaque box has a million dollars -> you get a million dollars
If you want to consider the version of this where the predictor is almost perfect instead of truly perfect, I don't think that changes anything. Say it's 99% accurate or even 90% accurate.
You take the opaque box -> the predictor has a 90% chance of predicting this -> it follows that there's a 90% chance that the box has a million dollars -> you have a 90% chance of getting a million dollars
Had you picked both boxes, you have a 90% chance of not getting the million.
As a one boxer myself this is the way I see it. The problem is presented as a value proposition, how do I walk out of this room with the most money? This typically prompts people to think of this as a statistics problem, and you genuinely can use statistics to 'solve' this, but that has nothing to do with what seems to be the problems underlying premise. Which is, do you actually believe that the predictor can even make that prediction? I think most people who take things logically at face value (like Veritasium watchers) would accept the premise as it's set up. I think for the two boxers there's some doubt they have about the predictors actual abilities, or they think they can 'outsmart' the predictor. The problem is setup in such a way however that it's fundamentally impossible to outsmart, which is I think what makes this problem so interesting.
> Why would you ever take both boxes?
As near as I can tell, it boils down to this: no matter what the predictor has chosen, one you walk into that room, there's more money in both boxes, then there is in one box.
But it feels like half an analysis—focusing solely on what you decide, while ignoring the fact that the other side is deciding based on what you think they'll decide.
Maybe that's me being unfair, because I'm a solid one boxer.
I also disagree with the linked article—I don't think it matters at all how the predictor makes their decision, because the outcome really doesn't matter if it's 100% accurate or 99% accurate. Or even like, 80% accurate. There's no magic required for the experiment to work.
Even if it's 50% accurate the benefit of picking both is marginal. If it's 50% accurate and you pick both boxes, 50% of the time you get nothing, 50% of the time you get 0.1% more. So unless you know that the predictor is worse than chance, you at worst suffer a meaningless loss if you pick one box.
(To two-boxers) it's not half an analysis because once you walk into the room your choice is causally independent of the demon/AI/alien prediction.
There's something vaguely similar to the fallacy of proposed (Cooperate,Cooperate) solutions to the Prisoner's Dilemma. The arguments go as follows: (1) if we're both rational agents and we have the same information and same payoffs, we will make the same choice; (2) therefore, (Cooperate,Defect) and (Defect,Cooperate) are out of the question; (3) therefore, the only options are (Defect,Defect) and (Cooperate,Cooperate); (4) so I should Cooperate since it gives the better payoff. It seems to follow logically but (1) and (2) are problematic because you can't assume symmetrical solutions and thus eliminate asymmetrical outcomes, because that is essentially the same as saying "what I choose causally affects what my opponent chooses".
In the same way, one-boxing is irrational (for this argument, anyway; I'm undecided myself) because the prediction has already been made, and so your choice to one-box or two-box cannot have any causal relevance to the contents of the boxes. Even a perfect predictor cannot invert the flow of causality.
> that is essentially the same as saying "what I choose causally affects what my opponent chooses".
No, it’s the same as saying “what my opponent thinks I will choose causally affects what my opponent chooses”, which is obviously true. Also, “what my opponent thinks I will choose is positively correlated with what I do choose”, unless my opponent isn’t very good.
The thing is, you can not chose once you are in the room. The fact that an [almost] perfect predictor exists implies that your choice must already be fixed at the point in time the predictor makes its prediction. Or the other way around, if you could still choose both options once you are in the room, say by basing your choice on some truly random and therefore unpredictable event like a radioactive decay - at least as far as we know truly unpredictable - then the predictor could not [almost] perfectly predict your choice, i.e. an [almost] perfect predictor can not exist.
So what you really want is to be in a state that will make you chose one box and you want to already be in that state at the time the predictor makes its predictions because the predictor will see this and place the million dollars into the second box. And as we have already said, you can not chose to take two boxes afterwards as that would contradict the existence of the predictor.
> So what you really want is to be in a state that will make you chose one box and you want to already be in that state at the time the predictor makes its predictions because the predictor will see this and place the million dollars into the second box.
Here’s the thing: no, I don’t. I’d much rather walk away with the easy million instead of risking it all for an extra thousand.
You can only get a thousand - take both boxes - or a million - take only the second box, zero and one million one thousand are not possible - or at least unlikely - because that would require a misprediction by the predictor but we assume an [almost] perfect predictor.
Then I for one choose the million.
The existence of a flawless predictor means that you do not have a choice after the predictor made its prediction, the decision must already be baked into the state of the universe accessible to the predictor. It also precludes that any true randomness is affecting the choice as that could not be predicted ahead of time.
I do not think that allowing some prediction error fundamentally changes this, it only means that sometimes the choice may depend on unpredictable true randomness or sometimes the predictor does not measured the relevant state of the universe exactly enough or the prediction algorithm is not flawless. But if the predictor still arrives at the correct prediction most of the time, then most of the time you do not have a choice and most of the time the choice does not depend on true randomness.
Which also renders the entire paradox somewhat moot because there is no choice for you to be made. The existence of a good predictor and the ability to make a choice after the prediction are incompatible. Up to wild time travel scenarios and thinks like that.
> Which also renders the entire paradox somewhat moot because there is no choice for you to be made.
Not quite. You did choose your decision making methods at some point in your life, and you could change them multiple times till you came to the setup of Newcomb's paradox. If we look at your past life as a variable in the problem, then changing this variable changes the outcome, it changes the prediction made by the predictor.
> The existence of a flawless predictor means that you do not have a choice after the predictor made its prediction
I believe, that if your definition of a choice stop working if we assume a deterministic Universe, then you need a better definition of a choice. In a deterministic Universe becomes glaringly obvious that all the framework of free will and choice is just an abstraction, that abstract away things that are not really needed to make a decision.
Moreover I think I can hint how to deal with it: relativity. Different observers cannot agree if an observed agent has free will or not. Accept it fundamentally, like relativity accepts that the universal time doesn't exist, and all the logical paradoxes will go away.
> I believe, that if your definition of a choice stop working if we assume a deterministic Universe, then you need a better definition of a choice. In a deterministic Universe becomes glaringly obvious that all the framework of free will and choice is just an abstraction, that abstract away things that are not really needed to make a decision.
Indeed, I think of concepts like "agency", "choice", "free will", etc. as aspects of a particular sort of scientific model. That sort of model can make good predictions about people, organisations, etc. which would be intractable to many other approaches. It can also be useful in situations that we have more sophisticated models for, e.g. treating a physical system as "wanting" to minimise its energy can give a reasonable prediction of its behaviour very quickly.
That sort of model has also been applied to systems where its predictive powers aren't very good; e.g. modelling weather, agriculture, etc. as being determined by some "will of the gods", and attempting to infer the desires of those gods based on their observed "choices".
It baffles me that some people might think a model of this sort might have any relevance at a fundamental level.
This is a compatibilist view. However, we can tell that most people don't adhere to a compatibilist view of free will because it tends to make people very upset if you suggest they have no "genuine" free will or agency, and the moral implications behind the assumption of genuine agency are e.g. baked into everything from our welfare systems to our justice systems, that assumes people have an actual choice in what they do.
For that reason I strongly disagree with the compatibilist view - language is defined by use, and most people act in ways that clearly signal a non-compatibilist view of free will.
I personally don't see any problems with this. For example, in most cases we can agree if someone had free will or no when they committed a crime. Even if I prefer compatibilist view and you are not, we'll agree in the most cases. In the cases when we may not agree the reasons of disagreement will not stem from this fundamental disagreements, but from things like should we treat state of affect as a state when a human has no free will.
So, a compatibilist view is not incompatible with the world we live in, but moreover, it is needed to keep our world functioning. The world we live in is mostly artificially constructed. Welfare and justice systems are not "genuine", they are artificial constructs. They play a role in our society and the ideas of "free will" and "guilt" are constructed also, and they are tweaked to make our systems to work better. If you assume that free will and guilt are "genuine" or God given, then you can't tune them to better match their purposes. You are losing agency this way, losing part of your free will, you can't consciously and reasonably discuss if state of affect should be an exception from the rule "any person has a free will". You'll be forced either to skip the discussion, or to resort to some kind of theological arguments.
But if you accept, that "free will" is a social construct, then you can easily identify the affected variables: it is all about punishment for crimes or awarding people for their pro-social deeds. You can think of how "state of affect inhibits free will" can influence all these goals, you can think of the possibility of people simulating state of affect (or even nurturing their personal traits that increase the probability of entering state of affect) to avoid a punishment. You can think rationally, logically and to pick a solution that benefits the society the best. Those very society with baked in idea of free will. Or you can choose to believe "free will" is God given, because of an irrelevant linguistic argument, and lose the ability to make our world better.
> most people act in ways that clearly signal a non-compatibilist view of free will.
Of course, we are not living in quantum mechanics we live in a world that is constructed by people. I mean, all this is built on top of QM, but QM laws do not manifest themselves directly for us. We have other explanatory structures to deal with everyday physics. But even physics doesn't matter that much: I turn the switch and voila I have light, and the heck with conservation of energy. I can talk to you, despite we are residing on different continents, 1000s of km don't matter. If I want to eat I do not try to kill some animal to eat it nor do I gather seeds and roots in a wild to eat them. I go to work and do something, get my salary and buy food in a local store. We are living in an artificial world with artificial rules. Free will is part of this world. Of course we talk about it like it exists. We talk about it like it is a universal truth. Relativity I mentioned above doesn't show itself most of the time, because the world is constructed in a way, when we can agree about someone having it. Situations when this is not the case are very strange and can be even punished: manipulation (which is come close to taking people's agency away from them) is deemed amoral.
The world constructed so we can ignore that free will is just an illusion, moreover it is constructed to think about it in terms of free will, so you'll have issues thinking about it in other terms. Like you'll have a lot of issues trying to calculate aerodynamic of a plane relying on equations of quantum mechanics.
A compatibilist view, to me, is usually immoral, because it seems to maintain the pretence of agency while admitting it's an illusion, and so persist in accepting treating people as if they have agency.
People who at least genuinely believe in free will and agency has an excuse if they e.g. support punishment that is not strictly aimed at minimising harm including to the perpetrator. A compatibilist has no excuse.
It is of course possible to hold a compatibilist view and still argue we should restructure society to treat people as if they do not have agency, but then the point on holding onto the illusion drops to near zero.
A flawless predictor would indicate you’re in a simulation, but also we cannot even simulate multiple cells at the most fine-grained level of physics.
But also you’re right that even a pretty good (but not perfect) predictor doesn’t change the scenario.
What I find interesting is to change the amounts. If the open box has $0.01 instead of $1000, you’re not thinking ”at least I got something”, and you just one-box.
But if both boxes contain equal amounts, or you swap the amounts in each box, two-boxing is always better.
All that to say, the idea that the right strategy here is to ”be the kind of person who one-boxes” isn’t a universe virtue. If the amounts change, the virtues change.
A flawless predictor would indicate you’re in a simulation [...]
No, it does not. Replace the human with a computer entering the room, the predictor analyzes the computer and the software running on the computer when it enters. If the decision program does not query a hardware random source or some stray cosmic particle changes the choice, the predictor could perfectly predict the choice just by accurately enough emulating the computer. If the program makes any use of external inputs, say the image from an attached webcam, the predictor also needs to know those inputs well enough. The same could, at least in principle, work for humans.
I agree with you that it doesn't require that you are in a simulation, but a flawless predictor would be a strong indication that a simulation is possible, and that should raise our assumed probability that we're in a simulation.
I would think that the existence of a flawless predictor is probably more likely to indicate that memories of predictions, and any associated records, have been modified to make the predictor appear flawless.
I would say that if we presume memories of everyone involved have been modified, that is an equally strong predictor that we are in a simulation.
Where does this obsession with the simulation hypothesis come from, it has been so widespread in the last years? It is more or less pointless to think about it, it will not get you anywhere. You only know this universe, to some extend, but you have no idea what a real universe looks like and you have no idea what a simulated universe looks like, so you will never be able to tell which kind our universe is.
But what if we discover that our universe is made from tiny voxels or something like that, that will be undeniable evidence, right? Wrong! Who says that real universes are not made of tiny voxels? It could be [1] the other way around, maybe real universes are discrete but their universe simulations are continuous, in which case the lack of tiny voxels in our universe would be the smoking gun evidence for being in a simulation.
[1] This is meant as an example, I have no idea if one can actually come up with a discrete universe that admits continuous simulations, which probably should also be efficient in some sense.
I don't know about y'all, but this paradox was resolved to my complete satisfaction in a blog post some years ago, I believe by Scott Aaronson, though I can't find the link. If the predictor has such a good success rate, then it must be simulating people's brains, but since it's not always right, the simulation isn't perfect. The best strategy for playing this game therefore is to look for indications as to whether I'm the real me or the simulation when the question is posed to me, and choose accordingly. Am I floating in a sensory deprivation tank being asked my choice by a disembodied voice with no recollection of how I got there and no memory of my childhood? In that case maybe I'm the simulation, so my answer is that I'll choose just one box. Is it an ordinary day of my life and a plausible setting with all of my faculties and recollections intact? Then I'll assume simulated me had my back and take both boxes.