Hacker News

Why I don't think AGI is imminent

2026-02-1523:34138306dlants.me

February 14, 2026The CEOs of OpenAI and Anthropic have both claimed that human-level AI is just around the corner — and at times, that it's already here. These claims have generated enormous public…

Show article

February 14, 2026

The CEOs of OpenAI and Anthropic have both claimed that human-level AI is just around the corner — and at times, that it's already here. These claims have generated enormous public attention. There has been some technical scrutiny of these claims, but critiques rarely reach the public discourse. This piece is a sketch of my own thinking about the boundary of transformer-based large language models and human-level cognition. I have an MS degree in Machine Learning from over a decade ago, and I don't work in the field of AI currently, but I am well-read on the underlying research. If you know more than I do about these topics, please reach out and let me know, I would love to develop my thinking on this further.

Research in evolutionary neuroscience has identified a set of cognitive primitives that are hardwired into vertebrate brains: some of these are a sense of number, object permanence, causality, spatial navigation, and the ability to distinguish animate from inanimate motion. These capacities are shared across vertebrates, from fish to ungulates to primates, pointing to a common evolutionary origin hundreds of millions of years old.

Language evolved on top of these primitives — a tool for communication where both speaker and listener share the same cognitive foundation. Because both sides have always had these primitives, language takes them for granted and does not state them explicitly.

Consider the sentence "Mary held a ball." To understand it, you need to know that Mary is an animate entity capable of intentional action, that the ball is a separate, bounded, inanimate object with continuous existence through time, that Mary is roughly human-sized and upright while the ball is small enough to fit in her hand, that her hand exerts an upward force counteracting gravity, that the ball cannot pass through her palm, that releasing her grip would cause the ball to fall, and that there is one Mary and one ball, each persisting as the same entity from moment to moment, each occupying a distinct region of three-dimensional space. All of that is what a human understands from four words, and none of it is in the text. Modern LLMs are now trying to reverse-engineer this cognitive foundation from language, which is an extremely difficult task.

I find this to be useful framing for understanding many of the observed limitations of current LLM architectures. For example, transformer-based language models can't reliably do multi-digit arithmetic because they have no number sense, only statistical patterns over digit tokens. They can't generalize simple logical relationships — a model trained on "A is B" can't infer "B is A" — because they lack the compositional, symbolic machinery.

One might object: modern AIs are now being trained on video, not just text. And it's true that video prediction can teach something like object permanence. If you want to predict the next frame, you need to model what happens when an object passes behind an occluder, which is something like a representation of persistence. But I think the reality is more nuanced. Consider a shell game: a marble is placed under one of three cups, and the cups are shuffled. A video prediction model might learn the statistical regularity that "when a cup is lifted, a marble is usually there." But actually tracking the marble through the shuffling requires something deeper — a commitment to the marble as a persistent entity with a continuous trajectory through space. That's not merely a visual pattern.

The shortcomings of visual models align with this framing. Early GPT-based vision models failed at even basic spatial reasoning. Much of the recent progress has come from generating large swaths of synthetic training data. But even in this, we are trying to learn the physical and logical constraints of the real world from visual data. The results, predictably, are fragile. A model trained on synthetic shell game data could probably learn to track the marble. But I suspect that learning would not generalize to other situations and relations — it would be shell game tracking, not object permanence.

Developmental psychologist Elizabeth Spelke's research on "core knowledge" has shown that infants — including blind infants — represent objects as bounded, cohesive, spatiotemporally continuous entities. This isn't a learned visual skill. It appears to be something deeper: a fundamental category of representation that the brain uses to organize all sensory input. Objects have identity. They persist. They can't teleport or merge. This "object-ness" likely predates vision itself — it's rooted in hundreds of millions of years of organisms needing to interact with things in the physical world, and I think this aspect of our evolutionary "training environment" is key to our robust cognitive primitives. Organisms don't merely observe reality to predict what happens next. They perceive in order to act, and they act in order to perceive. Object permanence allows you to track prey behind an obstacle. Number sense lets you estimate whether you're outnumbered. Logical composition enables tool construction and use. Spatial navigation helps you find your way home. Every cognitive primitive is directly linked to action in a rich, multisensory, physical world.

As Rodney Brooks has pointed out, even human dexterity is a tight coupling of fine motor control and rich sensory feedback. Modern robots do not have nearly as rich of sensory information available to them. While LLMs have benefited from vast quantities of text, video, and audio available on the internet, we simply don't have large-scale datasets of rich, multisensory perception coupled to intentional action. Collecting or generating such data is extremely challenging.

What if we built simulated environments where AIs could gather embodied experience? Would we be able to create learning scenarios where agents could learn some of these cognitive primitives, and could that generalize to improve LLMs? There are a few papers that I found that poke in this direction.

Google DeepMind's SIMA 2 is one. Despite the "embodied agent" branding, SIMA 2 is primarily trained through behavioral cloning: it watches human gameplay videos and learns to predict what actions they took. The reasoning and planning come from its base model (Gemini Flash-Lite), which was pretrained on internet text and images — not from embodied experience. There is an RL self-improvement stage where the agent does interact with environments, but this is secondary; the core intelligence is borrowed from language pretraining. SIMA 2 reaches near-human performance on many game tasks, but what it's really demonstrating is that a powerful language model can be taught to output keyboard actions.

Can insights from world-model training actually transfer to and improve language understanding? DeepMind's researchers explicitly frame this as a trade off between two competing objectives: "embodied competence" (acting effectively in 3D worlds) and "general reasoning" (the language and math abilities from pretraining). They found that baseline Gemini models, despite being powerful language models, achieved only 3-7% success rates on embodied tasks — demonstrating that embodied competence is not something that emerges from language pretraining. After fine-tuning on gameplay data, SIMA 2 achieved near-human performance on embodied tasks while showing "only minor regression" on language and math benchmarks. But notice the framing: the best case is that embodied training doesn't hurt language ability too much. There's no evidence that it improves it. The two capabilities sit in separate regions of the model's parameter space, coexisting but not meaningfully interacting. LLMs have billions of parameters, and there is plenty of room in those weights to predict language and to model a physical world separately. Bridging that gap — using physical understanding to actually improve language reasoning — remains undemonstrated.

DeepMind's Dreamer 4 also hints at this direction. Rather than borrowing intelligence from a language model, Dreamer 4 learns a world model from gameplay footage, then trains an RL agent within that world model through simulated rollouts where the agent takes actions, observes consequences provided by the world model, and updates its policy. This is genuinely closer to perception-action coupling: the agent learns through acting. However, the goal of this research is not general intelligence — it's sample-efficient control for robotics. The agent is trained and evaluated on predefined task milestones (get wood, craft pickaxe, find diamond), scored by a learned reward model. Nobody has tested whether the representations learned through this sort of training generalize to reasoning, language, or anything beyond the specific control tasks they were trained on. The gap between "an agent that learns to get diamonds in Minecraft through simulated practice" and "embodied experience that produces transferable cognitive primitives" is enormous and entirely unexplored.

As far as I understand, we don't know how to:

embed an agent in a perception-action coupled training environment
create an objective and training process that leads it to learn cognitive primitives like spatial reasoning or object permanence
leverage this to improve language models or move closer to general artificial intelligence

Recent benchmarking work underscores how far we are. Stanford's ENACT benchmark (2025) tested whether frontier vision-language models exhibit signs of embodied cognition — things like affordance recognition, action-effect reasoning, and long-horizon memory. The results were stark: current models lag significantly behind humans, and the gap widens as tasks require longer interaction horizons.

In short: world models are a genuinely exciting direction, and they could be the path to learning foundational primitives like object permanence, causality, and affordance. But this work is still in the absolute earliest stages. Transformers were an incredible leap forward, which is why we can now have things like the ENACT benchmark which better illustrate the boundaries of cognition. I think this area is really promising, but research in this space could easily take decades.

I will also mention that the most prominent "world model" comes from Yann LeCun, who recently left Meta to start AMI Labs. His Joint Embedding Predictive Architecture (JEPA) is a representation learning method: it trains a Vision Transformer on video data, masking parts of the input and predicting their abstract representations rather than their raw pixels. The innovation is predicting in representation space rather than input space, which lets the model focus on high-level structure and ignore unpredictable low-level details. This is a genuine improvement over generative approaches for learning useful embeddings. But despite the "world model" branding, JEPA's actual implementations (I-JEPA, V-JEPA, V-JEPA 2) are still training on passively observed video — not on agents embedded in physics simulations. There is no perception-action coupling, no closed-loop interaction with an environment. JEPA is a more sophisticated way to learn from observation, but by the logic of the argument above, observation alone is unlikely to yield the cognitive primitives that emerge from acting in the world.

The ARC-AGI benchmark offers an important illustration of where these primitives show up. ARC tasks are grid-based visual puzzles that test abstract reasoning: spatial composition, symmetry, relational abstraction, and few-shot generalization. They require no world knowledge or language — just the ability to infer abstract rules from a handful of examples and apply them to novel cases. Humans solve these tasks trivially, usually in under two attempts. When ARC-AGI-2 launched in March 2025, pure LLMs scored 0% and frontier reasoning systems achieved only single-digit percentages. By the end of the year, refinement-loop systems — scaffolding that wraps a model in iterative generate-verify-refine cycles — pushed scores to 54% on the semi-private eval and as high as 75% on the public eval using GPT-5.2, surpassing the 60% human average. But the nature of this progress matters as much as the numbers.

The nature of this progress is telling: the top standalone model without refinement scaffolding — Claude Opus 4.5 — scores 37.6%. It takes a refinement harness running dozens of iterative generate-verify-refine cycles at $30/task to push that to 54%, and a combination of GPT-5.2's strongest reasoning mode plus such a harness to reach 75%. This is not behavior that comes out of the core transformer architecture — it is scaffolded brute-force search, with each percentage point requiring substantially more compute. The ARC Prize Grand Prize at 85% remains unclaimed.

ARC is important because it illustrates the kind of abstract reasoning that seems central to intelligence. For humans, these capabilities arose from embodied experience. It's conceivable that training methods operating in purely abstract or logical spaces could teach an agent similar primitives without embodiment. We simply don't know yet. Research in this direction is just beginning, catalyzed by benchmarks like ARC that are sharpening our understanding of the boundary between what LLMs do and what intelligence actually requires. Notably, the benchmark itself is evolving in this direction ARC-AGI-3 introduces interactive reasoning challenges requiring exploration, planning, memory, and goal acquisition — moving closer to the perception-action coupling that I argue is central to intelligence.

It's worth addressing a common counterargument here: AI models have saturated many benchmarks in recent years, and we have to keep introducing new ones. Isn't this just moving the goalposts? I don't think this framing is true - benchmark saturation is exactly how we learn what a benchmark was actually measuring. Creating different benchmarks in response is not goalpost-moving — it's the normal process of refining our instruments and understanding. The "G" in AGI stands for "general" — truly general intelligence should transfer from one reasoning task to another. If a model had genuinely learned abstract reasoning from saturating one benchmark, the next benchmark testing similar capabilities should be easy, not devastating. The fact that each new generation of benchmarks consistently exposes fundamental failures is itself evidence about the nature of the gap. The ARC benchmark series illustrates this well: the progression from ARC-AGI-1 to ARC-AGI-3 didn't require heroic effort to find tasks that stump AI while remaining easy for humans - it just required refining the understanding of where the boundary lies. Tasks that are trivially easy for humans but impossible for current models are abundant (see multi-digit arithmetic, above). The benchmark designers aren't hunting for exotic edge cases; they're mapping a vast territory of basic cognitive capability that AI simply doesn't have.

I didn't realize while writing this piece that Google DeepMind released Gemini 3 Deep Think (February 12, 2026), which scored 84.6% on ARC-AGI-2 — just shy of the 85% Grand Prize threshold. For context, the base Gemini 3 Pro model scores 31.1%. The entire 53-point gap is inference-time compute: extended reasoning chains, parallel hypothesis exploration, and search.

This result is significant. While I wasn't able to find details about the architecture behind this particular model, the ARC Prize team's earlier analysis of 2025 submissions identifies "refinement loops" — iterative generate-verify-refine cycles — as the central theme driving progress. The intelligence is coming from scaffolding rather than from the base model having learned general abstract reasoning. As the ARC Prize team put it:

For the ARC-AGI-1/2 format, we believe the Grand Prize accuracy gap is now primarily bottlenecked by engineering while the efficiency gap remains bottlenecked by science and ideas. ARC Prize stands for open AGI progress, and, as we've previously committed, we will continue to run the ARC-AGI-2 Grand Prize competition in 2026 to track progress towards a fully open and reproducible solution.

As good as AI reasoning systems are, they still exhibit many flaws and inefficiencies necessary for AGI. We still need new ideas, like how to separate knowledge and reasoning, among others. And we'll need new benchmarks to highlight when those new ideas arrive.

I am now really curious about how the agents will fare with AGI-3, which comes out in March 2026. Are refinement loops / search / extended CoT chains effective at general reasoning? My guess is that these techniques are specifically fitting to the geometric pattern format of AGI 1 and 2, and we'll see a big drop-off in performance on AGI-3, which will be recovered over time as teams adjust their scaffolding to the new challenges.

The transformer architectures powering current LLMs are strictly feed-forward. Information flows from tokens through successive layers to the output, and from earlier tokens to later ones, but never backward. This is partly because backpropagation — the method used to train neural networks — requires acyclic computation graphs. But there's also a hard practical constraint: these models have hundreds of billions of parameters and are trained on trillions of tokens, and rely heavily on reusing computation. When processing token N+1, an LLM reuses all the computation from tokens 1 through N (a technique called KV caching). This is what makes training and inference tractable at scale. But it also means the architecture is locked into a one-directional flow — processing a new token can never revisit or revise the representations of earlier ones. Any architecture that allowed backward flow would compromise this caching, requiring novel computational techniques to make it tractable at scale.

Human brains function in a fundamentally different way. The brain is not a feed-forward pipeline. Activations reverberate through recurrent, bidirectional connections, eventually settling into stable patterns. For every feedforward connection in the visual cortex, there is a reciprocal feedback connection carrying contextual information back to earlier processing stages. When you recognize a face, it's not the output of a single forward pass — it's the result of distributed activity that echoes back and forth between regions until the system converges on an interpretation.

This is not to say that the human brain architecture is necessary to reach general intelligence. But the contrast helps contextualize just how constrained current LLM architectures are. There's a growing body of peer-reviewed theoretical work formalizing these constraints. Merrill and Sabharwal have shown that fixed-depth transformers with realistic (log-precision) arithmetic fall within the complexity class TC⁰ — which means they provably cannot recognize even regular languages or determine whether two nodes in a graph are connected. These are formally simple problems, well within the reach of basic algorithms, that transformers provably cannot solve in a single forward pass. This isn't an engineering limitation to be overcome with more data or compute — it's a mathematical property of the architecture itself. And Merrill and Sabharwal go further, arguing that this is a consequence of the transformer's high parallelizability: any architecture that is as parallelizable — and therefore as scalable — will hit similar walls.

What might alternative architectures look like? Gary Marcus has long advocated for other approaches, like neurosymbolic AI — hybrid systems that combine neural networks with explicit symbolic reasoning modules for logic, compositionality, and variable binding. I think that neural architectures with feedback connections — networks that are not strictly feed-forward but allow information to flow backward and settle into stable states — could learn to represent cognitive primitives. The challenge, as discussed above, is that such architectures break the computational shortcuts that make current transformers trainable and deployable at scale. In either case, getting neurosymbolic, recurrent or bidirectional neural networks to work at the scale of modern LLMs is an open engineering and research problem.

A reader pointed out that chain of thought effectively invalidates the feed-forward argument, since we are never doing a single feed-forward pass, but instead repeated passes where preceding tokens are fed back into the network. As such, the transformer can use its own context window as a working space to solve a more complex class of problems. After this, I found a follow-up paper by the same authors (Merrill & Sabharwal, ICLR 2024) that confirms this. While a single forward pass through a transformer is limited to TC⁰, allowing the model to generate intermediate "chain of thought" tokens — where each token is the output of a new forward pass conditioned on all previous tokens — fundamentally extends its computational power. Specifically, with a polynomial number of CoT steps, a transformer can solve any problem in P.

This matters because modern "reasoning" models (OpenAI's o-series, Anthropic's Claude with extended thinking, DeepSeek R1) do exactly this: they generate long chains of intermediate reasoning tokens before producing an answer. The theoretical result says that this approach, in principle, overcomes the TC⁰ barrier I described above.

I'll admit I was a victim of anti-AI media hype on this point. I was sold on the architecture argument after reading a Wired article and an accompanying paper that brushed off CoT's impact on complexity, arguing that the base operation still carries the limited complexity and that token budgets are too small. In hindsight, that doesn't really address the formal result.

That said, there are important caveats. First, the theoretical result is about expressive power — what a transformer with CoT could compute with the right weights — not about what models actually learn to do. As the authors themselves note: "our lower bounds do not directly imply transformers can learn to use intermediate steps effectively." Whether current training methods (including reinforcement learning) can actually teach models to exploit this theoretical capacity is an open question.

Second, the P result works by showing that a transformer can encode the transitions of any specific Turing machine, with the CoT tokens serving as the tape. But AGI would require something more demanding: the feed-forward network would need to encode a universal Turing machine — one capable of reading a novel problem, constructing a solution strategy, and executing it. (Some smart) humans can do this. Whether a fixed-depth transformer can learn to do this through CoT, even in principle, is a much stronger claim than "CoT reaches P."

Furthermore, the systems achieving the highest scores on ARC-AGI-2 — like Gemini 3 Deep Think at 84.6% — go beyond simple sequential chain of thought. They use parallel hypothesis exploration, search over candidate solutions, and iterative refinement loops. This is a genuine extension to the feed-forward architecture: the transformer is no longer operating alone but is embedded in a broader program that orchestrates multiple inference passes, evaluates their outputs, and steers the search. In the original version of this piece, I suggested that alternative architectures with feedback connections might be needed. What's actually emerging is something different — the feedback is happening outside the model, in scaffolding that wraps the transformer in a loop. Whether this external scaffolding can ultimately substitute for the kind of internal recurrence I was imagining remains to be seen, but the progress is harder to dismiss than I initially thought.

So the architecture argument is weaker than I originally stated, but it isn't entirely gone. The theoretical ceiling has been raised from TC⁰ to P, which is a significant expansion. Whether models can actually reach that ceiling through current training methods, and whether P is sufficient for the kind of flexible, general reasoning that characterizes intelligence, remain open questions.

Most people encounter AGI through CEO proclamations. Sam Altman claims that OpenAI knows how to build superintelligent AI. Dario Amodei writes that AI could be "smarter than a Nobel Prize winner across most relevant fields" by 2026. These are marketing statements from people whose companies depend on continued investment in the premise that AGI is imminent. They are not technical arguments.

Meanwhile, the actual research community tells a different story. A 2025 survey by the Association for the Advancement of Artificial Intelligence (AAAI), surveying 475 AI researchers, found that 76% believe scaling up current AI approaches to achieve AGI is "unlikely" or "very unlikely" to succeed. The researchers cited specific limitations: difficulties in long-term planning and reasoning, generalization beyond training data, causal and counterfactual reasoning, and embodiment and real-world interaction. This is an extraordinary disconnect.

Consider the AI 2027 scenario, perhaps the most widely-discussed AGI forecast of 2025. The underlying model's first step is automating coding, which is entirely based on an extrapolation of the METR study on coding time horizons. The METR study collects coding tasks that an AI can complete with a 50% success rate, and tracks how the duration of those tasks grows over time. But task duration is not a measure of task complexity. As the ARC-AGI benchmarks illustrate, there are classes of problems that take humans only seconds to solve but that require AI systems thousands of dollars of compute and dozens of iterative refinement cycles to approach — and even then, the 85% Grand Prize threshold remains unmet. The focus on common coding tasks strongly emphasizes within distribution tasks, which are well-represented within the AI training set. The 50% success threshold also allows one to ignore precisely the tricky, out of distribution, short tasks that agents may not be making any progress on at all. The second step within the 2027 modeling is agents developing "research taste". My take is that research taste is going to rely heavily on the short-duration cognitive primitives that the ARC highlights but the METR metric does not capture.

I'd encourage anyone interested in this topic to seek out technical depth. Understand what these systems actually can and can't do. The real story is fascinating - it's about the fundamental nature of intelligence, and how far we still have to go to understand it.

Betting against AI is difficult currently, due to the sheer amount of capital being thrown at it. One thing I've spent a lot of time thinking about is — what if there's a lab somewhere out there that's about to crack this? Maybe there are labs — even within OpenAI and Anthropic themselves — that are already working on all of these problems and keeping them secret?

But the open questions described above are not the kind of problem a secret lab can solve. They are long-standing problems that span multiple different fields — embodied cognition, evolutionary neuroscience, architecture design and complexity theory, training methodology and generalizability. Solving problems like this requires a global research community working across disciplines over many years, with plenty of dead ends along the way. This is high-risk, low-probability-of-reward, researchers-tinkering-in-a-lab kind of work. It's not a sprint towards a finish line.

This also helps us frame what AI companies are actually doing. They're buying up GPUs, building data centers, expanding product surface area, securing more funding. They are scaling up the current paradigm, which doesn't really have bearing on the fundamental research that can make progress in the problems highlighted above.

I'm not saying that AGI is impossible, or even that it won't come within our lifetime. I fully believe neural networks, using appropriate architectures and training methods, can represent cognitive primitives and reach superhuman intelligence. They can probably do this without repeating our long evolutionary history, by training in simulated logical / symbolic simulations that have little to do with the physical world. I am also not saying that LLMs aren't useful. Even the current technology is fundamentally transforming our society (see AI is not mid - a response to Dr. Cottom’s NYT Op-Ed)

We have to remember though that neural networks have their origins in the 1950's. Modern backpropagation was popularized in 1986. Many of the advances that made modern GPTs possible were discovered gradually over the following decades:

Long Short-Term Memory (LSTM) networks, which solved the vanishing gradient problem for sequence modeling — Hochreiter and Schmidhuber, 1997
Attention mechanisms, which allowed models to dynamically focus on relevant parts of their input — Bahdanau et al., 2014
Residual connections (skip layers), which made it possible to train networks hundreds of layers deep — He et al., 2015
The transformer architecture itself, which combined attention with parallelizable training to replace recurrent networks entirely — Vaswani et al., 2017

Transformers have fundamental limitations. They are very powerful, and they have taught us a lot about what general intelligence is. We are gaining a more and more crisp understanding of where the boundaries lie. But solving these problems will require research, which is a non-linear processs full of dead ends and plateaus. It could take decades, and even then we might discover new and more nuanced issues.

Read the original article

anonymid

Karma: 132

@Hacker__News
@hacker._news

Comments

By hi_hi 2026-02-162:1619 reply

Here's a thought. Lets all arbitrarily agree AGI is here. I can't even be bothered discussing what the definition of AGI is. It's just here, accept it. Or vice versa.

Now what....? Whats happening right now that should make me care that AGI is here (or not). Whats the magic thing thats happening with AGI that wasn't happening before?

Right, so, not much has changed from 1-2 years ago that I can tell. The job markets a bit shit if you're in software...is that what we get for billions of dollars spent?

By hackyhacky 2026-02-164:435 reply

Cultural changes take time. It took decades for the internet to move from nerdy curiosity to an essential part of everyone's life.

The writing is on the wall. Even if there's no new advances in technology, the current state is upending jobs, education, media, etc

By themafia 2026-02-165:482 reply

> It took decades

It took one September. Then as soon as you could take payments on the internet the rest was inevitable and in _clear_ demand. People got on long waiting lists just to get the technology in their homes.

> no new advances in technology

The reason the internet became so accessible is because Moore was generally correct. There was two corresponding exponential processes that vastly changed the available rate of adoption. This wasn't at all like cars being introduced into society. This was a monumental shift.

I see no advances in LLMs that suggest any form of the same exponential processes exist. In fact the inverse is true. They're not reducing power budgets fast enough to even imagine that they're anywhere near AGI, and even if they were, that they'd ever be able to sustainably power it.

> the current state is upending jobs

The difference is companies fought _against_ the internet because it was so disruptive to their business model. This is quite the opposite. We don't have a labor crisis, we have a retention crisis, because companies do not want to pay fair value for labor. We can wax on and off about technology, and perceptrons, and training techniques, or power budgets, but this fundamental fact seems the hardest to ignore.

If they're wrong this all collapses. If I'm wrong I can learn how to write prompts in a week.

By hackyhacky 2026-02-166:45

> It took one September.

It's the classic "slowly, then suddenly" paradigm. It took decades to get to that one September. Then years more before we all had internet in our pocket.

> The reason the internet became so accessible is because Moore was generally correct.

Can you explain how Moore's law is relevant to the rise of the internet? People didn't start buying couches online because their home computer lacked sufficient compute power.

> I see no advances in LLMs that suggest any form of the same exponential processes exist.

LLMs have seen enormous growth in power over the last 3 years. Nothing else comes close. I think they'll continue to get better, but critically: even if LLMs stay exactly as powerful as they are today, it's enough to disrupt society. IMHO we're already at AGI.

> The difference is companies fought _against_ the internet

Some did, some didn't. As in any cultural shift, there were winners and losers. In this shift, too, there will be winner and losers. The panicked spending on data centers right now is a symptom of the desire to be on the right side of that.

> because companies do not want to pay fair value for labor.

Companies have never wanted to pay fair value for labor. That's a fundamental attribute of companies, arising as a consequence of the system of incentives provided in capitalism. In the past, there have been opportunities for labor to fight back: government regulation, unions. This time that won't help.

> If I'm wrong I can learn how to write prompts in a week.

Why would you think that anyone would want you to write prompts?

By nubg 2026-02-171:251 reply

what September?

By hackyhacky 2026-02-172:20

This is an allusion to the old days, before the internet became a popular phenomenon. It used to be, that every September a bunch of "newbies" (college student who just access to an internet connection for the first time) would log in and make a mess of things. Then, in the late nineties when it really took off, everybody logged in and made a mess of things. This is this the "eternal september." [1]

[1] https://en.wikipedia.org/wiki/Eternal_September

By materielle 2026-02-164:515 reply

I really think corporations are overplaying their hand if they think they can transform society once again in the next 10 years.

Rapid de industrialization followed by the internet and social media almost broke our society.

Also, I don’t think people necessarily realize how close we were to the cliff in 2007.

I think another transformation now would rip society apart rather than take us to the great beyond.

By foo42 2026-02-169:171 reply

I worry that if the reality lives up to investors dreams it will be massively disruptive for society which will lead us down dark paths. On the other hand if it _doesn't_ live up to their dreams, then there is so much invested in that dream financially that it will lead to massive societal disruption when the public is left holding the bag, which will also lead us down dark paths.

By pydry 2026-02-1611:501 reply

It's already made it impossible to trust half of the content i read online.

Whenever i use search terms to ask a specific question these days theres usually a page of slop dedicated to the answer which appears top for relevancy.

Once i realize it is slop i realize the relevant information could be hallicinated so i cant trust it.

At the same time im seeing a huge upswing in probable human created content being accused of being slop.

We're seeing a tragedy of the information commons play out on an enormous scale at hyperspeed.

By Induane 2026-02-1615:57

You trust nearly half??!!??

By hackyhacky 2026-02-165:30

I think corporations can definitely transform society in the near future. I don't think it will be a positive transformation, but it will be a transformation.

Most of all, AI will exacerbate the lack of trust in people and institutions that was kicked into high gear by the internet. It will be easy and cheap to convince large numbers of people about almost anything.

By BobbyJo 2026-02-165:013 reply

As a young adult in 2007, what cliff were we close to?

The GFC was a big recession, but I never thought society was near collapse.

By edmundsauto 2026-02-165:31

We were pretty close to a collapse of the existing financial system. Maybe we’d be better off now if it happened, but the interim devastation would have been costly.

By zeroonetwothree 2026-02-165:29

It felt like the entire global financial system had a chance of collapsing.

By verzali 2026-02-1611:14

We weren't that far away from ATMs refusing to hand out cash, banks limiting withdrawals from accounts (if your bank hadn't already gone under), and a subsequent complete collapse of the financial system. The only thing that saved us from that was an extraordinary intervention by governments, something I am not sure they would be capable of doing today.

By the1st 2026-02-1616:57

I'm still not buying that AI will change society anywhere as much as the internet or smart phones for the matter.

The internet made it so that you can share and access information in a few minute if not seconds.

Smart phones build on the internet by making this sharing and access of information could done from anywhere and by anyone.

AI seems occupies the same space as google in the broader internet ecosystem.I dont know what AI provides me that a few hours of Google searches. It makes information retrieval faster, but that was the never the hard part. The hard part was understanding the information, so that you're able to apply it to your particalar situation.

Being able to write to-do apps X1000 faster is not innovation!

By graemep 2026-02-1613:241 reply

You are assuming that the change can only happen in the west.

The rest of the world has mostly been experiencing industrialisation, and was only indirectly affected by the great crash.

If there is a transformation in the rest of the world the west cannot escape it.

A lot of people in the west seem to have their heads in the sand, very much like when Japan and China tried to ignore the west.

China is the world's second biggest economy by nominal GDP, India the fourth. We have a globalised economy where everything is interlinked.

By expedition32 2026-02-1614:322 reply

When I look at my own country it has proven to be open to change. There are people alive today who remember Christianity now we swear in a gay prime minister.

In that sense Western countries have proven that they are intellectualy very nimble.

By graemep 2026-02-1615:31

Three of the best known Christians I have known in my life are gay. Two are priests (one Anglican, one Catholic). Obviously the Catholic priest had taken a vow of celibacy anyway to its entirely immaterial. I did read an interview of a celeb friend (also now a priest!) of his that said he (the priest I knew) thought people did not know he was gay we all knew, just did not make a fuss about it.

Even if you accept the idea that gay sex is a sin, the entire basis of Christianity is that we are all sinners. Possessing wealth is a failure to follow Jesus's commands for instance. You should be complaining a lot more if the prime minister is rich. Adultery is clearly a more serious sin than having the wrong sort of sex, and I bet your country has had adulterous prime ministers (the UK certainly has had many!).

I think Christians who are obsessed with homosexuality as somehow making people worse than the rest of us, are both failing to understand Christ's message, and saying more about themselves than gays.

If you look at when sodomy laws were abolished, countries with a Christian heritage lead this. There are reasons in the Christian ethos if choice and redemption for this.

By hackyhacky 2026-02-1614:36

> people alive today who remember Christianity now we swear in a gay prime minister

Why would that be a contradiction? Gay people can't be Christian?

By otabdeveloper4 2026-02-169:531 reply

> Cultural changes take time. It took decades for the internet to move from nerdy curiosity to an essential part of everyone's life.

99% of people only ever use proprietary networks from FAANG corporations. That's not "the internet", that's an evolution of CompuServe and AOL.

We got TCP/IP and the "web-browser" as a standard UI toolkit stack out of it, but the idea of the world wide web is completely dead.

By rglover 2026-02-1615:33

Shockingly rare how few realize this. It's a series of mega cities interconnected by ghost towns out here.

By hi_hi 2026-02-165:03

yeah, this is a good point, transition and transformation to new technologies takes time. I'm not sure I agree the current state is upending things though. It's forcing some adaption for sure, but the status quo remains.

By webdoodle 2026-02-165:001 reply

It also took years for the Internet to be usable by most folks. It was hard, expensive and unpractical for decades.

Just about the time it hit the mainstream coincidentally, is when the enshitification began to go exponential. Be careful what you wish for.

By hackyhacky 2026-02-165:321 reply

Allow me to clarify: I'm not wishing for change. I am an AI pessimist. I think our society is not prepared to deal with what's about to happen. You're right: AI is the key to the enshitification of everything, most of all trust.

By bulbar 2026-02-165:45

Governments and companies have been pushing for identity management that connects your real life identity with your digital one for quite some time. With AI I believe that's not only a bad thing, maybe unavoidable now.

By tim333 2026-02-1615:192 reply

What's happening with AGI depends on what you mean by AGI so "can't even be bothered discussing what the definition" means you can't say what's happening.

My usual way of thinking about it is AGI means can do all the stuff humans do which means you'd probably after a while look out the window and see robots building houses and the like. I don't think that's happening for a while yet.

By danaris 2026-02-1618:151 reply

Indeed: particularly given that—just as a nonexhaustive "for instance"—one of the fairly common things expected in AGI is that it's sapient. Meaning, essentially, that we have created a new life form, that should be given its own rights.

Now, I do not in the least believe that we have created AGI, nor that we are actually close. But you're absolutely right that we can't just handwave away the definitions. They are crucial both to what it means to have AGI, and to whether we do (or soon will) or not.

By tim333 2026-02-1620:52

I'm not sure how the rights thing will go. Humans have proved quite able not to give many rights to animals or other groups of humans even if they are quite smart. Then again there was that post yesterday with a lady accusing OpenAI of murdering her AI boyfriend by turning off 4o so no doubt there will be lots of arguments over that stuff. (https://news.ycombinator.com/item?id=47020525)

By kjkjadksj 2026-02-1618:252 reply

Who would the robots build houses for? No one has a job and no one is having kids in that future.

By elfly 2026-02-1620:55

Where are the robots going to sleep? Outside in the rain?

By therobots927 2026-02-1619:56

The billionaire elite. Isn’t it obvious? They want to get rid of us

By CamperBob2 2026-02-165:02

Before enlightenment^WAGI: chop wood, fetch water, prepare food

After enlightenment^WAGI: chop wood, fetch water, prepare food

By keernan 2026-02-1616:11

One of the most impactful books I ever read was Alvin Toffler's Future Shock.

Its core thesis was: Every era doubled the amount of technological change of the prior era in one half the time.

At the time he wrote the book in 1970, he was making the point that the pace of technological change had, for the first time in human history, rendered the knowledge of society's elders - previously the holders of all valuable information - irrelevant.

The pace of change has continued to steadily increase in the ensuing 55 years.

Edit: grammar

By jwilliams 2026-02-165:111 reply

> Here's a thought. Lets all arbitrarily agree AGI is here.

A slightly different angle on this - perhaps AGI doesn't matter (or perhaps not in the ways that we think).

LLMs have changed a lot in software in the last 1-2 years (indeed, the last 1-2 months); I don't think it's a wild extrapolation to see that'll come to many domains very soon.

By nradov 2026-02-1615:391 reply

Which domains? Will we see a lot of changes in plumbing?

By joquarky 2026-02-1617:33

If most of your work involves working with a monitor and keyboard, you're in one of the the domains.

Even if it doesn't, you will be indirectly affected. People will flock to trades if knowledge work is no longer a source of viable income.

By rstuart4133 2026-02-171:091 reply

> Lets all arbitrarily agree AGI is here. I can't even be bothered discussing what the definition of AGI is.

There is a definition of AGI the AI companies are using to justify their valuation. It's not what most people would call AGI but it does that job well enough, and you will care when it arrives.

They define it as an AI that can develop other AI's faster than the best team of human engineers. Once they build one of those in house they outpace the competition and become the winner that takes all. Personally I think it's more likely they will all achieve it at a similar time. That would mean the the race will continues, accelerating as fast as they can build data centres and power plants to feed them.

It will impact everyone, because the already dizzying pace of the current advances will accelerate. I don't know about you, but I'm having trouble figuring out what my job will be next year as it is.

An AI that just develops other AI's could hardly be called "general" in my book, but my opinion doesn't count for much.

By hi_hi 2026-02-175:111 reply

May I ask, what experiences are you personally having with LLMs right now that is leading you to the conclusion that they will become "intelligent" enough to identify, organise, and build advancing improvements to themselves, without any human interaction in the near future (1 - 2 years lets say)?

By rstuart4133 2026-02-1722:321 reply

> May I ask, what experiences are you personally having with LLMs right now that is leading you to the conclusion that they will become "intelligent" enough to identify, organise, and build advancing improvements to themselves, without any human interaction in the near future (1 - 2 years lets say)?

None, as I don't develop LLM's.

I wasn't saying I think they will succeed, but I think it is worth noting their AGI ambitions are not as grand as the term implies. Nonetheless, if they achieve them, the world will change.

By hi_hi 2026-02-181:431 reply

I mis-read. Thanks for clarifying :-)

By rstuart4133 2026-02-189:37

Re-reading, it's entirely my fault. I should have said:

> and you will care if/when it arrives.

By hshdhdhj4444 2026-02-1612:254 reply

If AGI is already here actions would be so greatly accelerated humans wouldn’t have time to respond.

Remember that weather balloon the US found a few years ago that for days was on the news as a Chinese spy balloon?

Well whether it was a spy balloon or a weather balloon but the first hint of its existence could have triggered a nuclear war that could have already been the end of the world as we know it because AGI will almost certainly be deployed to control the U.S. and Chinese military systems and it would have acted well before any human would have time to intercept its actions.

That’s the apocalyptic nuclear winter scenario.

There are many other scenarios.

An AGI which has been infused with a tremendous amount of ethics so the above doesn’t happen, may also lead to terrible outcomes for a human. An AGI would essentially be a different species (although a non biological one). If it replicated human ethics even when we apply them inconsistently, it would learn that treating other species brutally (we breed, enslave, imprison, torture, and then kill over 80 billion land animals annually in animal agriculture, and possibly trillions of water animals). There’s no reason it wouldn’t do that to us.

Finally, if we infuse it with our ethics but it’s smart enough to apply them consistently (even a basic application of our ethics would have us end animal agriculture immediately), so it realizes that humans are wrong and doesn’t do the same thing to humans, it might still create an existential crisis for humans as our entire identity is based on thinking we are smarter and intellectually superior to all other species, which wouldn’t be true anymore. Further it would erode beliefs in gods and other supernatural BS we believe which might at the very least lead humans to stop reproducing due to the existential despair this might cause.

By armoredkitten 2026-02-1614:42

You're talking about superintelligence. AGI is just...an AI that's roughly on par with humans on most things. There's no inherent reason why AGI will lead to ASI.

By nradov 2026-02-1615:251 reply

What a silly comment. You're literally describing the plot of several sci-fi movies. Nuclear command and control systems are not taken so lightly.

And as for the Chinese spy balloon, there was never any risk of a war (at least not from that specific cause). The US, China, Russia, and other countries routinely spy on each other through a variety of unarmed technical means. Occasionally it gets exposed and turns into a diplomatic incident but that's about it. Everyone knows how the game is played.

By user____name 2026-02-1617:59

"Nuclear command and control systems are not taken so lightly."

https://gizmodo.com/for-20-years-the-nuclear-launch-code-at-...

By deafpolygon 2026-02-1613:141 reply

AGI is not a death sentence for humanity. It all depends on who leverages the tool. And in any case, AGI won’t be here for decades to come.

By mapt 2026-02-1614:30

Your sentence seems to imply that we will delegate all AI decisions to one person who can decide how he wants to use it - to build or destroy.

Strong agentic AIs are a death sentence memo pad (or a malevolent djinn lamp if you like) that anyone can write on, because the tools will be freely available to leverage. A plutonium breeder reactor in every backyard. Try not to think of paperclips.

By koakuma-chan 2026-02-1613:21

Sounds fun let's do it.

By snapplebobapple 2026-02-230:52

Depends on the cost to run it.say it costs 5k to do a years worth of something intellectual with it. That means the price ceiling on 90% of lawyer/accountant/radiologist/low to middle management is 5k now. It will be epic and temporarily terrible when it happens as long as reasonably competent models are opensource. I also don't think we are near that at all though

By generallyjosh 2026-02-2114:09

I do strongly agree on the framing, but I'd argue with the conclusion

Yeah, it really doesn't matter if AGI has happened, is going to happen, will never happen, whatever. No matter what sort of definition we make for it, someone's always doing to disagree anyway. For a looong time, we thought the Turing test was the standard, and that only a truly intelligent computer could beat it. It's been blown out of the water for years now, and now we're all arguing about new definitions for AGI

At the end of the day, like you say, it doesn't matter a bit how we define terms. We can label it whatever we want, but the label doesn't change what it can DO

What it can DO is the important part. I think a lot of software devs are coming to terms with the idea that AI will be able to replace vast chunks of our jobs in the very near future.

If you use these things heavily, you can see the trajectory.

6 months ago I'd only trust them for boiler plate code generation and writing/reviewing short in-line documentation.

Today, with the latest models and tools, I'm trusting them with short/low impact tasks (go implement this UI fix, then redeploy the app locally, navigate to it, and verify the fix looks correct).

6 months from now, my best guess is that they'll continue to become more capable of handling longer + more complex tasks on their own.

5 years from now, I'm seeing a real possibility that they'll be handling all the code, end to end.

Doesn't matter if we call that AGI or not. It very much will matter whose jobs get cut, because one person with AI can do the work of 20 developers

By copx 2026-02-167:09

AGI would render humans obsolete and eradicate us sooner or later.

By Havoc 2026-02-168:31

Pretty sure marketing team s are already working on AGI v2

By tsukurimashou 2026-02-1613:391 reply

AGI is a pipe dream and will never exist

By joquarky 2026-02-1617:27

Odd to see someone so adamantly insist that we have souls on a forum like HN.

By munchler 2026-02-166:041 reply

I think you are missing the point: If we assume that AGI is *not* yet here, but may be here soon, what will change when it arrives? Those changes could be big enough to affect you.

By hi_hi 2026-02-167:251 reply

I'm missing the point? I literally asked the same thing you did.

>Now what....? Whats happening right now that should make me care that AGI is here (or not).

Do you have any insight into what those changes might concretely be? Or are you just trying to instil fear in people who lack critical thinking skills?

By MadcapJake 2026-02-1618:14

You did not ask the same thing. You framed the question such that readers are supposed to look at their current lives and realize nothing is different ergo AGI is lame. Your approach utilizes the availability bias and argument from consequences logical fallacies.

I think what you are trying to say is can we define AGI so that we can have an intelligent conversation about what that will mean for our daily lives?. But you oddly introduced your argument by stating you didn't want to explore this definition...

By m463 2026-02-1622:52

people are taking actions based on its advice.

By dyauspitr 2026-02-1615:011 reply

The economy is shit if you’re anything except a nurse or providing care to old people.

By nradov 2026-02-1615:40

Electricians are also doing pretty well. Someone has to wire up those new data centers.

By otabdeveloper4 2026-02-169:50

> The job markets a bit shit if you're in software

That's Trump's economy, not LLMs.

By skeptic_ai 2026-02-165:461 reply

Many devs don’t write code anymore. Can really deliver a lot more per dev.

Many people slowly losing jobs and can’t find new ones. You’ll see effects in a few years

By reactordev 2026-02-165:482 reply

Deliver a lot more tech debt

By qingcharles 2026-02-171:12

My LLMs do create non-zero amounts of tech debt, but they are also massively decreasing human-made tech debt by finding mountains of code that can be removed or refactored when using the newest frameworks.

By dainiusse 2026-02-166:441 reply

That tech debt will be cleaned up with a model in 2 years. Not that human don't make tech debt.

By shaky-carrousel 2026-02-168:321 reply

What that model is going to do in 2 years is replace tech debt with more complicated tech debt.

By geoelectric 2026-02-169:361 reply

One could argue that's a cynically accurate definition of most iterative development anyway.

But I don't know that I accept the core assertion. If the engineer is screening the output and using the LLM to generate tests, chances are pretty good it's not going to be worse than human-generated tech debt. If there's more accumulated, it's because there's more output in general.

By krethh 2026-02-1614:371 reply

Only if you accept the premise that the code generated by LLMs is identical to the developer's output in quality, just higher in volume. In my lived professional experience, that's not the case.

It seems to me that prompting agents and reviewing the output just doesn't.... trigger the same neural pathways for people? I constantly see people submit agent generated code with mistakes they would have never made themselves when "handwriting" code.

Until now, the average PR had one author and a couple reviewers. From now on, most PRs will have no authors and only reviewers. We simply have no data about how this will impact both code quality AND people's cognitive abilities over time. If my intuition is correct, it will affect both negatively over time. It remains to be seen. It's definitely not something that the AI hyperenthusiasts think at all about.

By joquarky 2026-02-1617:381 reply

> In my lived professional experience, that's not the case.

In mine it is the case. Anecdata.

But for me, this was over two decades in an underpaid job at an S&P500 writing government software, so maybe you had better peers.

By krethh 2026-02-170:451 reply

I stated plainly: "we have no data about this". Vibes is all we have.

It's not just me though. Loads of people subjectively perceiving a decrease in quality of engineering when relying on agents. You'll find thousands of examples on this site alone.

By reactordev 2026-02-1715:51

I have yet to find an agent that writes as succinctly as I do. That said, I have found agents more than capable of doing something.

By znnajdla 2026-02-166:015 reply

I've been writing code for 20 years. AI has completely changed my life and the way I write code and run my business. Nothing is the same anymore, and I feel I will be saying that again by the end of 2026. My productive output as a programmer in software and business have expanded 3x *compounding monthly*.

By myegorov 2026-02-166:154 reply

>My productive output as a programmer in software and business have expanded 3x compounding monthly.

In what units?

By znnajdla 2026-02-1612:56

Tasks completed in my todo list software I’ve been measuring my output for 5 years. Time saved because I built one off tools to automate many common workflows. And yes even dollars earned.

I don’t mean 3x compounding monthly every month, I mean 3x total since I started using Claude Code about 6 months ago but the benefits keep compounding.

By freshbreath 2026-02-169:021 reply

GWh

By tmtvl 2026-02-1612:55

Going from gigajoules to terajoules.

By merek 2026-02-168:51

Vibes

By hi_hi 2026-02-166:461 reply

Going from punch cards to terminals also "completely changed my life and the way I write code and run my business"

Firefox introducing their dev debugger many years ago "completely changed my life and the way I write code and run my business"

You get the idea. Yes, the day to day job of software engineering has changed. The world at large cares not one jot.

By brynnbee 2026-02-1617:34

I mean 2025 had the weakest job creation growth numbers outside of recession periods since at least 2003. The world seems to care in a pretty tangible way. There are other big influencing factors for that, too, of course.

By UncleMeat 2026-02-166:101 reply

Okay. So software engineers are vastly more efficient. Good I guess. "Revolutionize the entire world such that we rethink society down to its very basics like money and ownership" doesn't follow from that.

By pennomi 2026-02-166:202 reply

Man you guys are impatient. It takes decades even for earth shattering technologies to mature and take root.

By UncleMeat 2026-02-1615:01

If people want to make the "this will be AGI after two decades and will totally revolutionize the entire world" that's fine. If people want to make the "wow this is an incredibly useful tool for many jobs that will make work more efficient" that's fine. We can have those discussions.

What I don't buy is the "in two years there will be no more concept of money or poverty because AI has solved everything" argument using the evidence that these tools are really good at coding.

By hi_hi 2026-02-166:551 reply

Damn right I'm impatient. My eye starts twitching when a web page takes more than 2 seconds to load :-)

In the meantime, I've had to continuously hear talk about AI, both in real life (like at the local pub) AND virtually (tv/radio/news/whatever) and how it's going to change the world in unimaginable ways for the last...2/3 years. Billions upon billions of dollars are being spent. The only tangible thing we have to show is software development, and some other fairly niche jobs, have changed _a bit_.

So yeah, excuse my impatience for the bubble to burst, I can stop having to hear about this shit every day, and I can go about my job using the new tools we have been gifted, while still doing all the other jobs that sadly do not benefit in any similar way.

By otabdeveloper4 2026-02-169:57

> The only tangible thing we have to show is software development, and some other fairly niche jobs, have changed _a bit_.

There is zero evidence that LLMs have changed software development efficiency.

We get an earth-shattering developer productivity gamechanger every five years. All of them make wild claims, none of them ever have any data to back those claims up.

LLMs are just another in a long, long list. This too will pass. (Give it five years for the next gamechanger.)

By waterTanuki 2026-02-166:181 reply

Are you working for 3x less the time compounding monthly?

Are you making 3x the money compounding monthly ?

No?

Then what's the point?

By znnajdla 2026-02-1612:541 reply

Yes and yes.

By timeattack 2026-02-1613:273 reply

Okay, teach me how, then? I would also like to work 3× less and make 3× more.

By joquarky 2026-02-1617:49

People keep impatiently expecting proof from builders with no moat. It's like that Upton Sinclair quote.

By brynnbee 2026-02-1617:34

Start a software business, presumably.

By kbelder 2026-02-1616:441 reply

Ten more months in 2026, so you should be about 60,000x better by the end of the year.

By znnajdla 2026-02-1617:12

You say that as if it’s impossible but there are several indie makers that have gone from $10 MRR to $600k MRR over the past 8 months.

By hackable_sand 2026-02-1614:10

It's weird that you guys keep posting the same comments with the exact same formatting

You're not fooling anyone

By xhcuvuvyc 2026-02-167:171 reply

I actually think it is here. Singularity happened. We're just playing catch up at this point.

Has it runaway yet? Not sure, but is it currently in the process of increasing intelligence with little input from us? Yes.

Exponential graphs always have a slow curve in the beginning.

By hi_hi 2026-02-167:321 reply

Didn't you get the memo? Tuesday. Tuesday is when the Singularity happens.

Will there still be ice cream after Tuesday? General societal collapse would be hard to bare without ice cream.

By joquarky 2026-02-1617:44

Tuesday at 4 p.m to be specific.

By NiloCK 2026-02-162:053 reply

> The transformer architectures powering current LLMs are strictly feed-forward.

This is true in a specific contextual sense (each token that an LLM produces is from a feed-forward pass). But untrue for more than a year with reasoning models, who feed their produced tokens back as inputs, and whose tuning effectively rewards it for doing this skillfully.

Heck, it was untrue before that as well, any time an LLM responded with more than one token.

> A [March] 2025 survey by the Association for the Advancement of Artificial Intelligence (AAAI), surveying 475 AI researchers, found that 76% believe scaling up current AI approaches to achieve AGI is "unlikely" or "very unlikely" to succeed.

I dunno. This survey publication was from nearly a year ago, so the survey itself is probably more than a year old. That puts us at Sonnet 3.7. The gap between that and present day is tremendous.

I am not skilled enough to say this tactfully, but: expert opinions can be the slowest to update on the news that their specific domain may have, in hindsight, have been the wrong horse. It's the quote about it being difficult to believe something that your income requires to be false, but instead of income it can be your whole legacy or self concept. Way worse.

> My take is that research taste is going to rely heavily on the short-duration cognitive primitives that the ARC highlights but the METR metric does not capture.

I don't have an opinion on this, but I'd like to hear more about this take.

By anonymid 2026-02-164:023 reply

Thanks for reading, and I really appreciate your comments!

> who feed their produced tokens back as inputs, and whose tuning effectively rewards it for doing this skillfully

Ah, this is a great point, and not something that I considered. I agree that the token feedback does change the complexity, and it seems that there's even a paper by the same authors about this very thing! https://arxiv.org/abs/2310.07923

I'll have to think on how that changes things. I think it does take the wind out of the architecture argument as it's currently stated, or at least makes it a lot more challenging. I'll consider myself a victim of media hype on this, as I was pretty sold on this line of argument after reading this article https://www.wired.com/story/ai-agents-math-doesnt-add-up/ and the paper https://arxiv.org/pdf/2507.07505 ... who brush this off with:

>Can the additional think tokens provide the necessary complexity to correctly solve a problem of higher complexity? We don't believe so, for two fundamental reasons: one that the base operation in these reasoning LLMs still carries the complexity discussed above, and the computation needed to correctly carry out that very step can be one of a higher complexity (ref our examples above), and secondly, the token budget for reasoning steps is far smaller than what would be necessary to carry out many complex tasks.

In hindsight, this doesn't really address the challenge.

My immediate next thought is - even solutions up to P can be represented within the model / CoT, do we actually feel like we are moving towards generalized solutions, or that the solution space is navigable through reinforcement learning? I'm genuinely not sure about where I stand on this.

> I don't have an opinion on this, but I'd like to hear more about this take.

I'll think about it and write some more on this.

By igor47 2026-02-166:28

This whole conversation is pretty much over my head, but I just wanted to give you props for the way you're engaging with challenges to your ideas!

By joquarky 2026-02-1617:551 reply

You seem to have a lot of theoretical knowledge on this, but have you tried Claude or codex in the past month or two?

Hands on experience is better than reading articles.

I've been coding for 40 years and after a few months getting familiar with these tools, this feels really big. Like how the internet felt in 1994.

By anonymid 2026-02-183:441 reply

I've been developing an ai coding harness https://github.com/dlants/magenta.nvim for over a year now, and I use it (and cursor and claude code) daily at work.

Fun observation - almost every coding harness (claude code, cursor, codex) uses a find/replace tool as the primary way of interacting with code. This requires the agent to fully type out the code it's trying to edit, including several lines of context around the edit. This is really inefficient, token wise! Why does it work this way? Because the LLMs are really bad at counting lines, or using other ways of describing a unique location in the file.

I've experimented with providing a more robust dsl for text manipulation https://github.com/dlants/magenta.nvim/blob/main/node/tools/... , and I do think it's an improvement over just straight search/replace, but the agents do tend to struggle a lot - editing the wrong line, messing up the selection state, etc... which is probably why the major players haven't adopted something like this yet.

So I feel pretty confident in my assessment of where these models are at!

And also, I fully believe it's big. It's a huge deal! My work is unrecognizable from what it was even 2 years ago. But that's an impact / productivity argument, not an argument about intelligence. Modern programming languages, IDEs, spreadsheets, etc... also made a fundamental shift in what being a software engineer was like, but they were not generally intelligent.

By logicprog 2026-02-1823:30

> Fun observation - almost every coding harness (claude code, cursor, codex) uses a find/replace tool as the primary way of interacting with code. This requires the agent to fully type out the code it's trying to edit, including several lines of context around the edit. This is really inefficient, token wise! Why does it work this way? Because the LLMs are really bad at counting lines, or using other ways of describing a unique location in the file.

Incidentally, I saw an interesting article about exactly this subject a little ways back, using line numbers + hashes instead of typing out the full search/replace, writing patches, or doing a DSL, and it seemed to have really good success:

https://blog.can.ac/2026/02/12/the-harness-problem/

By skybrian 2026-02-164:42

It's general-purpose enough to do web development. How far can you get from writing programs and seeing if you get the answers you intended? If English words are "grounded" by programming, system administration, and browsing websites, is that good enough?

By vrighter 2026-02-1613:52

That doesn't mean it is not strictly feedforward.

You run it again, with a bigger input. If it needs to do a loop to figure out what the next token should be (Ex. The result is: X), it will fail. Adding that token to the input and running it again is too late. It has already been emitted. The loop needs to occur while "thinking" not after you have already blurted out a result whether or not you have sufficient information to do so.

By wavemode 2026-02-166:441 reply

> expert opinions can be the slowest to update on the news that their specific domain may have, in hindsight, have been the wrong horse. It's the quote about it being difficult to believe something that your income requires to be false, but instead of income it can be your whole legacy or self concept

Not sure I follow. Are you saying that AI researchers would be out of a job if scaling up transformers leads to AGI? How? Or am I misunderstanding your point.

By NiloCK 2026-02-173:241 reply

People have entire careers promoting incorrect ideas. Oxycontin, phrenology, the windows operating system.

Reconciling your self-concept with the negative (or fruitless) impacts of your life's work is difficult. It can be easier to deny or minimize those impacts.

By wavemode 2026-02-1723:53

Yeah that's the part I'm not following. You think AI researchers would have their life's work invalidated by the creation of AGI? How? Presumably (in that scenario) their life's work (AI research) will have been foundational to the creation of one of the most important inventions of all time.

Or is your reasoning that they will be upset about not having invented it themselves (similar to those conspiracy theories about, the cure for cancer existing but scientists withholding it so they can keep doing treatment research)?

By helterskelter 2026-02-165:122 reply

I don't know about AGI but I got bored and ran my plans for a new garage by Opus 4.6 and it was giving me some really surprising responses that have changed my plans a little. At the same time, it was also making some nonsense suggestions that no person would realistically make. When I prompted it for something in another chat which required genuine creativity, it fell flat on its face.

I dunno, mixed bag. Value is positive if you can sort the wheat from the chaff for the use cases I've ran by it. I expect the main place it'll shine for the near and medium term is going over huge data sets or big projects and flagging things for review by humans.

By bamboozled 2026-02-165:30

I've used for similar things, I've had some good and disastrous results. In a way I feel like I'm basically where I was "before AI".

By BatteryMountain 2026-02-165:552 reply

I've used it recently to flesh out a fully fledged business plan, pricing models, capacity planning & logistics for a 10 year period for a transport company (daily bus route). I already had most of it in my mind and on spreadsheets already (was an old plan that I wanted to revive), but seeing it figure out all the smaller details that would make or break it was amazing! I think MBA's should be worried as it did some things more comprehensive than an MBA would have done. It was like a had an MBA + Actuarial Scientist + Statistics + Domain Expert + HR/Accounting all in one. And the plan was put into a .md file that has enough structure to flesh out a backend and an app.

By helterskelter 2026-02-166:13

Yeah it's really impressed me on occasion, but often in the same prompt output it just does something totally nonsensical. For my garage/shop, it generated an SVG of the proposed floor plan, taking care to place the sink away from moisture sensitive material and certain work stations close to each other for work flow, etc. it even routed plumbing and electrical...But it also arranged the work stations cramped together at the two narrow ends of the structure (such that they'd be impractical to actually work at) and ignored all the free wall space along the long axis so that literally most of the space was unused. It was also concerned about things that were non issues like contamination between certain stations, and had trouble when I explicitly told it something about station placement and it just couldn't seem to internalize it and kept putting it in the wrong place.

All this being said, what I was throwing at it was really not what it was optimized for, and it still delivered some really good ideas.

By bamboozled 2026-02-167:214 reply

Isn't all of this only useful if you know the information presented is correct?

By entech 2026-02-1711:54

It seems that it’s useful if it’s better than what you would have done yourself.

Although the poster had a bus company business plan that includes actuarial analysis in his head and some spreadsheets so that bar appears to be sufficiently high.

By ravikapoor101 2026-02-1912:08

if his skill level was very low, the AI plan would still be impressive at his skill level.

In other words - If he was low skilled, AI would impress him. Now that he is high skilled, AI still impressed him.

In other words - AI improves on what a human can do.

By otabdeveloper4 2026-02-169:59

Don't worry about it. Just vibe your business plan, if it sounds impressive it's probably correct.

By BatteryMountain 2026-02-1711:501 reply

It is indeed correct. (else I wouldn't have posted it here)

By bamboozled 2026-02-1714:06

I wasn't arguing whether or not it's correct, I was pointing out that it's useful only because you know it's given you correct information.

Maybe we'll get to a point where we just trust everything we receive from these systems but I'm yet to meet a person who would fund a business solely on an LLMs generated business plans without being able to crosscheck them by someone trusted.