The changing goalposts of AGI and timelines

2026-03-0817:16404384mlumiste.com

Based on its own charter, OpenAI should surrender the race

Show article

March 8, 2026 3 minute read

Back in 2018, OpenAI published a charter, which includes a self-sacrifice clause:

We are concerned about late-stage AGI development becoming a competitive race without time for adequate safety precautions. Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”

Interestingly, this is still hosted at https://openai.com/charter/, meaning it remains the official company policy.

At the same time, explicitly stated AGI timelines by Sam Altman are the following:

Date	Predicted AGI Year	Diff (years)	Quote / Claim	Source
May 22, 2023	~2033	~10	“Within the next ten years, AI systems will exceed expert skill level in most domains”	OpenAI Blog — Governance of Superintelligence
Dec 2023	~2030	~6	“By the time the end of this decade rolls around, the world will be in an unbelievably better place”	TIME
Nov 4, 2024	~2029	~5	“I think in 5 years […] people are like, man, the AGI moment came and went”	20VC Podcast
Nov 8, 2024	2025	~1	“What are you excited about in 2025? - AGI”	Futurism
Jan 2025	~2029	~4	“AGI will probably get developed during Trump’s term”	Bloomberg
Sep 25, 2025	2030	~4	“By 2030, if we don’t have extraordinarily capable models that do things we can’t, I’d be very surprised”	TechSpot
Oct 28, 2025	2028	~2	“Automated AI research intern by Sep 2026, full AI researcher by Mar 2028”	OfficeChai
Dec 18, 2025	2025	0	“AGI kinda went whooshing by… okay fine, we built AGIs”	Windows Central
Feb 3, 2026	2025	~-1	“We basically have built AGI” (later: “a spiritual statement, not a literal one”)	ALM Corp

We can see that the timeline of AGI (let’s assume this is the timeline for a better-than-even chance) has accelerated and the median prediction since 2025 is around 2 years. Notably, in the latest interviews it’s claimed that AGI has been achieved, and we’re now racing towards ASI.

Finally, here’s a snapshot of the current overall Arena ranking of top 10 models.

Model	Overall	Expert	Hard Prompts	Coding	Math	Creative Writing	Instruction Following	Longer Query
claude-opus-4-6	1	1	2	2	3	4	2	2
claude-opus-4-6-thinking	2	2	1	1	2	1	1	1
gemini-3.1-pro-preview	3	3	3	3	1	5	3	3
grok-4.20-beta1	4	14	4	5	20	2	8	12
gemini-3-pro	5	7	5	9	5	3	9	5
gpt-5.4-high	6	4	10	11	8	6	5	11
gpt-5.2-chat-latest	7	10	7	6	4	9	10	15
gemini-3-flash	8	9	9	18	7	8	13	13
grok-4.1-thinking	9	17	13	19	21	19	28	27
claude-opus-4-5-202…	10	6	6	4	13	7	4	4

Based on these, the flagship GPT-5.4 model is clearly trailing behind competition. At least Anthropic’s and Google’s models are clearly safety-conscious, and probably value-aligned (whatever that means, but since the models are drop-in replacements to GPT, it should hold).

It can be debated whether arena.ai is a suitable metric for AGI, a strong case can probably be made for why it’s not. However, that’s irrelevant, as the spirit of the self-sacrifice clause is to avoid an arms race, and we are clearly in one.

Therefore, one can only conclude, that we currently meet the stated example triggering condition of “a better-than-even chance of success in the next two years”. As per its charter, OpenAI should stop competing with the likes of Anthropic and Gemini, and join forces, however that might look like.

While this will never happen, I think it’s illustrative of some great points for pondering:

The impotence of naive idealism in the face of economic incentives.
The discrepancy between marketing points and practical actions.
The changing goalposts of AGI and timelines. Notably, it’s common to now talk about ASI instead, implying we may have already achieved AGI, almost without noticing.

Read the original article

Comments

By djoldman 2026-03-0818:529 reply

Anytime I see "Artificial General Intelligence," "AGI," "ASI," etc., I mentally replace it with "something no one has defined meaningfully."

Or the long version: "something about which no conclusions can be drawn because the proposed definitions lack sufficient precision and completeness."

Or the short versions: "Skippetyboop," "plipnikop," and "zingybang."

By chrysoprace 2026-03-0823:513 reply

I've largely avoided using the term "AI" to refer to the current LLM and generative technology because it's loaded with too much ambiguity and glosses over the problems with those technologies in the context of conversations around it.

By datsci_est_2015 2026-03-0911:47

Same, but I have used the term “generative AI” to describe generative models. Never the naked “AI” though (except in conversations with friends where the difference is pedantic because they’re not subject matter experts).

By parliament32 2026-03-0916:12

"AI" implies intelligence, which is nowhere to be found. "Text generators" is the best descriptive term.

By xyzal 2026-03-098:52

"applied statistics"

By logicchains 2026-03-0819:144 reply

>Anytime I see "Artificial General Intelligence," "AGI," "ASI," etc., I mentally replace it with "something no one has defined meaningfully."

There are lots of meaningful definitions, the people saying we haven't reached AGI just don't use them. For most of the last half-century people would have agreed that machines that can pass the Turing test and win Math Olympiad gold are AGI.

By sebastos 2026-03-0819:371 reply

Firstly, the models that pass the Math Olympiad aren’t the same models as the ones you’re saying “pass the Turing test”. Secondly, nothing actually passes the Turing test. They pass a vibes check of “hey that’s pretty good!” but if your life depended on it, you could easily find ways to sniff out an LLM agent. Thirdly, none of these models learn in real time, which is an obviously essential feature.

We’ll know AGI when we see it, and this ain’t it. This complaining about changing goalposts is so transparently sour grapes from people over-invested in hyping the current LLM paradigm.

By ufmace 2026-03-0819:526 reply

> nothing actually passes the Turing test

Says who? I had already found this study, published almost a year ago, saying that they do: https://arxiv.org/abs/2503.23674

There doesn't seem to be a super-rigorous definition of the Turing Test, but I don't think it's reasonable to require it to fool an expert whose life depends on the correct choice. It already seems to be decently able to fool a person of average intelligence who has a basic knowledge of LLMs.

I agree that we don't really have AGI yet, but I'd hope we can come up with a better definition of what it is than "we'll know it when we see it". I think it is a legitimate point that we've moved the goalposts some.

By sebastos 2026-03-095:51

The real answer is that once LLMs passed a "casual" application of the Turing test, it just made us realize that the "casual Turing test" is not particularly interesting. It turns out to be too easy to ape human behavior over short time frames for it to be a good indicator of human-like intelligence.

Now, you could argue that this right here is the aforementioned moving of the goalposts. After all, we're deciding that the casual Turing test wasn't interesting precisely after having seen that LLMs could pass it.

However, in my view, the Turing test _always_ implied the "rigorous" Turing test, and it's only now that we're actually flirting with passing it that it had to be clarified what counts as a true Turing test. As I see it, the Turing test can still be salvaged as a criteria for genera intelligence, but only if you allow it to be a no-holds-barred, life-depends-on-it test to exhaustion. This would involve allowing arbitrarily long questioning periods, for instance. I think this is more in the spirit of the original formulation, because the whole idea is to pit a machine against all of human intelligence, proving it has a similar arsenal of adaptability at its disposal. If it only has to passingly fool a human for brief periods, well... I'm afraid that just doesn't prove much. All sorts of stuff briefly fools humans. What requires intelligence is to consistently anticipate and adapt to all lines of questioning in a sustained manner until the human runs out of ideas for how to differentiate.

By zeknife 2026-03-0821:39

ELIZA fooled plenty of people (both originally and in the study you just linked) but i still wouldn't say Eliza passed/passes the turing test in general. It just shows that occasionally or even frequently fooling people is not a sufficient proxy for general intelligence. Ofc there isn't a standardized definition, but one thing I would personally include in a "strict" Turing test is that the human interrogee ought to be incentivized to cooperate and to make their humanity as clear as possible. And the interrogator should similarly be incentivized to find the right answer.

By edanm 2026-03-0820:28

Turing gave a pretty rigorous definition of the Turing Test IMO. Well, as rigorous as something that is inherently "anecdotal" can be, which is part of the philosophical point of the Turing Test.

By claysmithr 2026-03-0823:49

The turing test is kind of a useless metric, either the machine is too dumb, or the machine is too quick and intelligent.

By runarberg 2026-03-0821:333 reply

First of. The Turing test has a rigorous definition. Secondly, it has been debunked for almost half a century at this point by Searle’s Chinese room thought experiment. Thirdly, intelligence it self is a scientifically fraught term with ever changing meaning as we discover more and more “intelligent” behavior in nature (by animals and plants, and more). And to make matters worse, general intelligence is even worse, as the term was used almost exclusively for racist pseudo-science, as a way to operationally define a metric which would prove white supremacy.

Artificial General Intelligence will exist when the grifters who profit from it claim it exists. The meaning of it will shift to benefit certain entrepreneurs. It will never actually be a useful term in science nor philosophy.

By Copyrightest 2026-03-0820:40

[dead]

By tokai 2026-03-0819:244 reply

Turing test is generally misunderstood, much like Schrodinger's cat, it has devolved in to a pop cultural meme. The test is to evaluate if a machine can think. Not if it is intelligent, not if it is human-like. Its dismissed as a useful by most experts in philosophy of mind, AI, language, etc..

Thinking cool and all but not that extraordinary. Even plants does it.

By debugnik 2026-03-097:50

> The test is to evaluate if a machine can think.

The test is to showcase that the question of whether machines can think is meaningless. The point of Turing's thesis is that passing his test just proves the machine has the capability to pass such a test, which is actually meaningful.

By runarberg 2026-03-0821:453 reply

I like the analogy with Schrödinger’s cat. Like Schrödinger’s cat it is actually not a good thought experiment. Both have been debunked. Schrödinger’s cat is applying quantum behavior (of a single interaction) to a macro system (with trillions of interactions). While the Turing test can be explained away with Searle’s Chinese room thought experiment.

I would argue that Schrödinger’s cat has done more damage to the general understanding of quantum physics then it has done good. In contrary though, I don‘t think the same about the Turing test. I think it has resulted in a net positive for the theory of mind as long as people take Searle’s rebuttal into account. Without it (as is sadly common in popular philosophy) the Turing test is simply just wrong, and offers no good insight for neither philosophy nor science.

By djoldman 2026-03-0823:582 reply

The Turing test and Searle's "rebuttal" are both pretty inconsequential. There's no real definition of "thinking," therefore neither proof/disprove or say much.

Turing's imitation game is about making it difficult for a human to tell whether they are communicating with a computer or not. If a computer can trick the human, then... what? The computer is "thinking" ?

I think most people would say that's an insufficient act to prove thinking. Even though no one has a rigorous definition of thinking either.

All this stuff goes around in circles and like most philosophy makes little progress.

By gosub100 2026-03-096:491 reply

What do you mean by Schrodinger's cat experiment being "debunked"? The only way I can think to debunk it is to say there are ways to determine if the cat is alive such as heartbeat or temperature, which are impossible to isolate at a quantum level. I don't think anyone claimed the animal was in a superposition.

By runarberg 2026-03-090:56

Note: I said “theory of mind” when I (obviously) meant “philosophy of mind”.

By namrog84 2026-03-0819:37

I thought that was part of the issue, is the poor understanding of is the test to evaluate if it can think or only if we think it can think. And even that is generalizable since there are different categories of thinking or concepts of the mind.

By hunterpayne 2026-03-097:54

"Thinking cool and all but not that extraordinary. Even plants does it."

Are you involved in politics somehow?

By orbital-decay 2026-03-0820:17

The most pragmatic definition I know is OpenAI's own: "highly autonomous systems that outperform humans at most economically valuable work". Which is still something between skippetyboop and zingybang, as it leaves a ton of room for OAI to decide if that moment is reached, and also economically valuable work is a moving target.

By b00ty4breakfast 2026-03-0821:41

If fooling people and doing math good are the criteria, we've had AGI for longer than we've had the modern internet.

By erichocean 2026-03-0910:34

> "something about which no conclusions can be drawn because the proposed definitions lack sufficient precision and completeness."

The same problem exists defining human intelligence, it's a problem with "intelligence" in general, artificial or not.

By politelemon 2026-03-0820:341 reply

The enskibidification of AI

By atomicnumber3 2026-03-0820:542 reply

Honestly, not enough of a joke.

I was thinking something similar - this isn't AI, and none of "those people" care if it is or isn't. They don't care philosophically, or even pragmatically.

They're selling a product. That product is the IDEA of replacement of the majority of human labor with what's basically slave labor but with substantially disregardable ethical quandaries.

It's honestly a genius product. I'm not surprised it's selling so well. I'm vaguely surprised so many people who don't stand to benefit in any way shape or form, or who will even potentially starve if it works out, are so keen on it. But there are always bootlickers.

The most unfortunate part is that when the party ends, it's none of "those people" who will suffer even in the slightest. I'm not even optimistic their egos will suffer, as Musk seems to show they are utterly immune even as their companies collapse under them.

By ryandrake 2026-03-0822:021 reply

AI is already "an employee who can't say no to questionable assignments." We should all be reflective about the real value and inevitable consequences of this work.

By hunterpayne 2026-03-090:31

There are also direct and very negative consequences we are having from AI right now. AI is the largest source of fake video propaganda and has largely destroyed the confidence people have in video evidence. I can imagine someone being held liable for these negative consequences, perhaps even extra-judicially.

By palmotea 2026-03-0916:021 reply

> I'm vaguely surprised so many people who don't stand to benefit in any way shape or form, or who will even potentially starve if it works out, are so keen on it. But there are always bootlickers.

I've been getting more and more disappointed by software engineers (in aggregate) as the years go by. They don't even have to be bootlickers to do what you describe, I think a lot of it is pride in their "intelligence," which they express by believing and regurgitating the propaganda they've consumed. They prove their smarts by (among other things) having opinions that align with a zeitgeist of some group of powerful elites. They're too-easily manipulated.

And it's not just AI, it's also things like libertarianism. You've got workers identifying as capitalist tycoons, because they read a book and have some shares in a 401k.

By wolvesechoes 2026-03-109:57

> I've been getting more and more disappointed by software engineers (in aggregate) as the years go by

Sometimes I am dismayed by the lack of political and social consciousness in this group. Decade or two of digital boom coupled with handsome paychecks was enough to convince them that their position is different than it really is.

By wise_blood 2026-03-097:171 reply

the ARC definition is the one I like the best, something like:

"it is AGI when we can no longer come up with tasks easy for humans to solve but hard for computers"

By DiscourseFan 2026-03-098:27

We are very very far from that point

By shepherdjerred 2026-03-0821:547 reply

They define AGI in their charter

> artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work

By djoldman 2026-03-0822:092 reply

That definition is as I said: "something about which no conclusions can be drawn because the proposed definitions lack sufficient precision and completeness."

"Highly autonomous systems" and "most economically valuable work" aren't precise enough to be useful.

"Highly" implies that there is a continuum, so where does directed end and autonomy begin?

"Most economically valuable work"... each word in that has wiggle room, not to mention that any reasonable interpretation of it is a shifting goalpost as the work done by humans over history has shifted a great deal.

The point is that none of this is defined in a way so that people can agree that something has AGI/ASI/etc. or not. If people can't agree then there's no point in talking about it.

EDIT: interestingly, the OpenAI definition of AGI specifically means that a subset of humans do not have AGI.

By daxfohl 2026-03-090:322 reply

I think you can say if human engineers still exist, it's hard to claim we have AGI. If human engineers have been entirely replaced, then it's hard to claim we don't have AGI.

By kgwgk 2026-03-090:381 reply

Because they are doing most of the economically valuable work?

By xmcqdpt2 2026-03-0911:51

By the definition above, it is possible to have AGI that is also much more expensive to run than human engineers.

By nomel 2026-03-0822:353 reply

It's a definition based on practical results. That's a good definition, because it doesn't require we already know the exact implementation. It doesn't require guessing, in a literal "put your money where your mouth is" way.

If it can do things as good as or better than humans, then either the AI has a type of general intelligence or the human does not.

Defining capabilities based on outcome rather than implementation should be very familiar to an engineer, of any kind, because that's how every unsolved implementation must start.

By godelski 2026-03-092:391 reply

  > If it can do things as good as or better than humans, then either the AI has a type of general intelligence or the human does not.

I don't buy that.

By your definition every machine has a type of general intelligence. Not just a bog standard calculator, but also my broom. It doesn't matter if you slap "smart" on the side, I'm not going to call my washing machine "intelligent". Especially considering it's over a decade old.

I don't think these definitions make anything any clearer. If anything, they make them less. They equate humans to mindless automata. They create AGI by sly definition and let the proposer declare success arbitrarily.

By tbrownaw 2026-03-0823:341 reply

What is the as-of date on what work is economically valuable and how much is available?

By irishcoffee 2026-03-0823:192 reply

Do you know how an LLM works? Can you describe it?

By pinkmuffinere 2026-03-090:00

This definition is not very precise though. For example, I think it can be argued from this definition that we had already reached AGI by the year 2010 (or earlier!). By 2010, computers were integrated into >50% economically valuable work, to the point that humans had mostly forgotten how to do them without computers. Drafting blueprints by hand was already a thing of the past, slide-rules were archaic, paper spreadsheets were long gone. You can debate whether these count as 'highly autonomous', but I don't think it's a clear slam-dunk either way. Not to mention dishwashers, textile weaving machines, CNC machines, assembly lines where >50% is automated, chemical/mineral refining operations, etc.

The definition reminds me of the common quip about robotics, "it's robotics when it doesn't work, once it works it's a machine".

By maplethorpe 2026-03-0822:47

In my experience, AGI always seemed to be the stand-in phrase for "human like" intelligence, after AI was co-opted to mean simpler things like markov-chain chat bots and state machines that control agent behaviour in video games.

If the definition has shifted once again to mean "a computer program that does a task pretty well for us", then what's the new term we're using to define human-level artificial intelligence?

By catlifeonmars 2026-03-0822:56

> economically valuable work

Is doing a ton of heavy lifting. What is considered economically valuable work is going to change from decade to decade, if not from year to year. What’s considered economically valuable also is going to be way different depending across individuals and nations within the exact same time frames too.

By chrsw 2026-03-0822:08

I take "outperform" to mean "can replace".

By marcus_holmes 2026-03-094:041 reply

y'see, I would not define a system as "highly autonomous" if it only responds to requests.

And I get that there are workarounds; effectively a cron job every second prompting "do the next thing".

But in my personal definition of "highly autonomous" it would not need prompting at all. It would be thinking all the time, independently of requests.

By dragonwriter 2026-03-094:241 reply

The model is not the system. The model is a component of the system. The "cron job" (or other means by which a continuous action loop is implemented) and the necessary prompting for it to gather input (including subsequent user input or other external data) and to pursue a set of objectives which evolves based on input are all also parts of the system.

By xmcqdpt2 2026-03-0912:08

is it most as an 50% of individual jobs? or able to produce 50% dollar for dollar?

what does "economically" means here? would it cover teaching? child care? healthcare? etc.

By ozgung 2026-03-0822:022 reply

That’s the problem with the discussions on AI. No one defines the terms they use.

If we define AGI as an AI not doing a preset task but can be used for general purpose, then we already have that. If we define it as human level intelligence at _every_ task, then some humans fail to be an AGI. If we define AGI as a magic algorithm that does every task autonomously and successfully then that thing may not exist at all, even inside our brains.

When the AGI term was first coined they probably meant something like HAL 9000. We have that now (and HAL gaining self-awareness or refusing commands are just for dramatic effect and not necessary). Goalposts are not stable in this game.

By VorpalWay 2026-03-0822:343 reply

It is not just AGI that is poorly defined. Plain AI is moving goalposts too. When the A* search algorithm was introduced in the late 60s, that was considered AI, when SVM (support vector machines) and KNN (K nearest neighbor) were new, they were AI. And so on.

These days it is neural networks and transformer models for language in particular that people mean when they say unqualified AI.

It is very hard to have a meaningful discussion when different parties mean different things with the same words.

By dataflow 2026-03-090:021 reply

I think the Turing test ought to be fine, but we need to be less generous to the AI when executing it. If there exists any human that can consistently tell your AI apart from humans without without insider knowledge, then I don't think you can claim to have AGI. Even if 99.9% of humans can't tell you apart.

So I'm very curious if any AI we have today would pass the Turing test under all circumstances, for example if: the examiner was allowed to continue as long as they wanted (even days/weeks), the examiner could be anybody (not just random selections of humans), observations other than the text itself were fair game (say, typing/response speed, exhaustion, time of day, the examiner themselves taking a break and asking to continue later), both subjects were allowed and expected to search on the internet, etc.

By Wowfunhappy 2026-03-0822:382 reply

I really wish I could wave a magic wand and make everyone stop using the term "AI". It means everything and nothing. Say "machine learning" if that's what you mean.

By marcus_holmes 2026-03-094:01

Agree. I talk about LLMs when discussing them, and avoid the term "AI" unless I'm talking about the entire industry as a whole. I find it really helps to be specific in this case.

By dalmo3 2026-03-093:41

> some humans fail to be an AGI

All humans fail to be AGI, by definition.

By random3 2026-03-0822:54

I call these "romantic definitions" or "gesticulations". For private use (personal or even internal to teams) they can be great placeholders, assuming the goal is to refine vocabulary.

By TacticalCoder 2026-03-0823:401 reply

That's no argument: the exact same can be said for what "AI" is: "Skippetyboop," "plipnikop," and "zingybang.".

By djoldman 2026-03-0823:51

> the exact same can be said for what "AI" is: "Skippetyboop," "plipnikop," and "zingybang.".

Yes.

By choult 2026-03-0817:241 reply

The writing was on the wall as soon as it went all-in on commercializing the tech.

This will never happen, LLMs are already being used very unsafely, and if this HN headline stays where it is OpenAI will quietly remove their charter from their website.

By d0able 2026-03-097:06

It's just lip service at this point.

By sulam 2026-03-0818:544 reply

The reality is that current models are simply nowhere near AGI. Next token prediction has been pushed very far, and proven to have applicability far beyond the original domain it was designed for (reasoning models are an application I would not have predicted) but it is fundamentally not AGI. It has no real world model, no ability to learn in any but superficial ways, and without extensive scaffolding this is all very obvious when you use them.

By onlyrealcuzzo 2026-03-0823:122 reply

How many months has it been since we were told there would be zero software engineers left in the world in 12 months?

By SpicyLemonZest 2026-03-0916:08

What Dario Amodei said 12 months ago is that AI would be "writing essentially all of the code", and the job of software engineers would become guiding and reviewing the code generation process. That's come true at a number of companies.

The important context I think people may miss is, this does not require AI to be 10x or 5x or even 1x as good as a human programmer. Claude is worse than me in meaningful ways at the kind of code I need to write, but it’s still doing almost all my coding because after 4.6 it’s smart enough to understand when I explain what program it should have written.

By xyzal 2026-03-098:54

>12

By ACCount37 2026-03-0821:223 reply

Given the mechanistic interpretability findings? I'm not sure how people still say shit like "no real world model" seriously.

By famouswaffles 2026-03-0821:541 reply

People just overstate their understanding and knowledge, the usual human stuff. The same user has a comment in this thread that contains:

'If you actually know what models are doing under the hood to product output that...'

Any one that tells you they know 'what models are dong under the hood' simply has no idea what they're talking about, and it's amazing how common this is.

By sulam 2026-03-0823:285 reply

Fair, I should define what I mean by under the hood. By “under the hood” I mean that models are still just being fed a stream of text (or other tokens in the case of video and audio models), being asked to predict the next token, and then doing that again. There is no technique that anyone has discovered that is different than that, at least not that is in production. If you think there is, and people are just keeping it secret, well, you clearly don’t know how these places work. The elaborations that make this more interesting than the original GPT/Attention stuff is 1) there is more than one model in the mix now, even though you may only be told you’re interacting with “GPT 5.4”, 2) there’s a significant amount of fine tuning with RLHF in specific domains that each lab feels is important to be good at because of benchmarks, strategy, or just conviction (DeepMind, we see you). There’s also a lot work being put into speeding up inference, as well as making it cheaper to operate. I probably shouldn’t forget tool use for that matter, since that’s the only reason they can count the r’s in strawberry these days.

None of that changes the concept that a model is just fundamentally very good at predicting what the next element in the stream should be, modulo injected randomness in the form of a temperature. Why does that actually end up looking like intelligence? Well, because we see the model’s ability to be plausibly correct over a wide range of topics and we get excited.

Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.

By sulam 2026-03-0821:291 reply

They have a _text_ model. There is some correlation between the text model and the world, but it’s loose and only because there’s a lot of text about the world. And of course robotics researchers are having to build world models, but these are far from general. If they had a real world model, I could tell them I want to play a game of chess and they would be able to remember where the pieces are from move to move.

By ACCount37 2026-03-0822:163 reply

What makes you think that text is inherently a worse reflection of the world than light is?

All world models are lossy as fuck, by the way. I could give you a list of chess moves and force you to recover the complete board state from it, and you wouldn't fare that much better than an off the shelf LLM would. An LLM trained for it would kick ass though.

By 10xDev 2026-03-0822:06

People are finding it hard to grasp emergent properties can appear at very large scales and dimensions.

By hintymad 2026-03-091:361 reply

> It has no real world model, no ability to learn in any but superficial ways

I also think so, and in the meantime I have to admit a lot of people don't learn deeply either. Take math for example, how many STEM students from elite universities truly understood the definition of limit, let alone calculus beyond simple calculation? Or how many data scientists can really intuitively understand Bayesian statistics? Yet millions of them were doing their job in a kinda fine way with the help of the stackexchange family and now with the help of AI.

By Spivak 2026-03-093:231 reply

Well part of that is because STE folks aren't typically required to take any kind of theoretical maths. It's $Math for Engineers and it eschews theoretical underpinnings for application. I don't think it's any kind of failing, it's just different. My statistics class was a dense treatise in measure theory. Anyone who took the regular stats class is almost surely way better than me at designing an experiment, but I can talk your ear off about Lebesgue measure to basically zero practical end.

By hintymad 2026-03-096:34

I was not talking about theoretical foundations like Analysis or measure theory, but just basics in college-level math class. There can be other examples. The point is that many people didn’t have intuitive understanding of what they use everyday — in a way they are like AI, only slower and know less than AI