Stop generating, start thinking

2026-02-0821:58161104localghost.dev

Instead of wanting to learn and improve as humans, and build better software, we’ve outsourced our mistakes to an unthinking algorithm.

Throughout my career, I feel like I’ve done a pretty decent job of staying top of new developments in the industry: attending conferences, following (and later befriending!) some of the very smart people writing the specs, being the one sharing news on Slack about exciting new features of CSS or JS with my colleagues. The joys of working on an internal tool where you only need to worry about latest Chrome, and playing with anchor positioning in a production app while it’s still experimental!

It’s very unsettling, then, to find myself feeling like I’m in danger of being left behind - like I’m missing something. As much as I don’t like it, so many people have started going so hard on LLM-generated code in a way that I just can’t wrap my head around.

I’ve been using Copilot - and more recently Claude - as a sort of “spicy autocomplete” and occasional debugging assistant for some time, but any time I try to get it to do anything remotely clever, it completely shits the bed. Don’t get me wrong, I know that a large part of this is me holding it wrong, but I find it hard to justify the value of investing so much of my time perfecting the art of asking a machine to write what I could do perfectly well in less time than it takes to hone the prompt.

You’ve got to give it enough context - but not too much or it gets overloaded. You’re supposed to craft lengthy prompts that massage the AI assistant’s apparently fragile ego by telling it “you are an expert in distributed systems” as if it were an insecure, mediocre software developer.

Or I could just write the damn code in less time than all of this takes to get working.

As I see more and more people generating code instead of writing it, I find myself wondering why engineers are so ready and willing to do away with one of the good bits of our jobs (coding) and leave themselves with the boring bit (reviews).

Perhaps people enjoy writing roleplay instructions for computers, I don’t know. But I find it dangerous that people will willingly - and proudly - pump their products full of generated code.

I’ll share a couple of the arguments I’ve encountered when I’ve expressed concern about this.

“This is the Industrial Revolution of our time! It’s like mechanisation all over again.”

Yes, this is true in many ways.

Firstly, when you consider how much the Industrial Revolution contributed to climate change, and look at the energy consumption of the data centres powering AI software, it’s easy to see parallels there. Granted, not all of this electricity is fossil-fuel-powered, so that’s some improvement on the Industrial Revolution, but we’re still wasting enormous amounts of resources generating pictures of shrimp Jesus.

Mechanisation made goods cheaper and more widely available, but at the cost of quality: it’s been a race to the bottom since the late 19th century and now we have websites like SHEIN where you can buy a highly flammable pair of trousers for less than a cup of coffee. Mechanisation led to a decline in skilled labour, made worse by companies gradually offshoring their factories to less economically developed countries where they could take advantage of poorly-paid workers with fewer rights, and make even more money.

Generated code is rather a lot like fast fashion: it looks all right at first glance but it doesn’t hold up over time, and when you look closer it’s full of holes. Just like fast fashion, it’s often ripped off other people’s designs. And it’s a scourge on the environment.

But there’s a key difference. Mechanisation involved replacing human effort in the manufacturing processes with machinery that could do the same job. It’s the equivalent of a codemod or a script that generates boilerplate code. The key thing is that it produces the same results each time. And if something went wrong, humans would be able to peer inside the machine and figure out what went wrong.

LLM output is non-deterministic, and the inner workings opaque. There’s no utility in a mechanised process that spits out something different every time, often peppered with hallucinations.

“LLMs are just another layer of abstraction, like higher level programming languages were to assembly.”

It’s true that writing Java or Go means I never had to bother learning assembly. The closest I get to anything resembling assembly is knitting patterns.

The way that we write software has evolved in terms of what we need to think about (depending on your language of choice): I don't have to think about garbage collection or memory allocation because the runtime does it for me. But I do still have to think about writing efficient code that makes sense architecturally in the wider context of our existing systems. I have to think about how the software I'm building will affect critical paths, and reason about maintainability versus speed of delivery. When building for the web, we have to think about browser support, accessibility, security, performance.

Where I've seen LLMs do the most damage is where engineers outsource the thinking that should go into software development. LLMs can't reason about what the system architecture because they cannot reason. They do not think. So if we're not thinking and they're not thinking that means nobody is thinking. Nothing good can come from software nobody has thought about.

In the wake of the Horizon scandal, where innocent Post Office staff went to prison because of bugs in Post Office software that led management to think they’d been stealing money, we need to be thinking about our software more than ever: we need accountability in our software.

Thirteen people killed themselves as a direct result of those bugs in that Post Office software, by the way.

Our terrible code is the problem

But, you may argue, human developers today write inaccessible, unperformant, JavaScript-heavy code! What's the difference?

Yes, exactly (or should I say “You’re absolutely right”?). LLMs are trained (without our explicit consent) on all our shitty code, and we've taught them that that's what they should be outputting. They are doomed to repeat humans’ mistakes, then be trained on the shitty reconstituted mistakes made by other LLMs in what’s (brilliantly) been called human centipede epistemology. We don't write good enough code as humans to deserve something that writes the same stuff faster.

And if you think we’ve done all right so far, we haven't: just ask anyone who uses assistive technology, or lives in a country with terrible Internet connection (or tries to get online on mobile data in any UK city, to be honest). Ask anyone who's being racially discriminated against by facial recognition software or even a hand dryer. Ask the Post Office staff.

Instead of wanting to learn and improve as humans, and build better software, we’ve outsourced our mistakes to an unthinking algorithm.

Four eyes good, two eyes bad

Jessica Rose and Eda Eren gave a brilliant talk at FFConf last year about the danger of AI coding assistants making us lose our skills. There was one slide in particular that stood out to me:

Jess and Eda on stage at FFConf in front of a slide that says "Code you did not write is code you do not understand.
You cannot maintain code you do not understand."

The difference between reviewing a PR written by human and one by an LLM is that there's a certain amount of trust in a PR by a colleague, especially one that I know. The PR has been reasoned about: someone has thought about this code. There are exceptions to every rule, yes: but I'd expect manager intervention for somebody constantly raising bad PRs.

Open source maintainers will tell you about the deluge of poor quality generated PRs they're seeing nowadays. As a contributor to any repository, you are accountable for the code you commit, even if it was generated by an LLM. The reviewer also holds some accountability, but you’ve still got two pairs of eyes on the change.

I’ve seen social media posts from companies showing off that they’re using e.g. Claude to generate PRs for small changes by just chatting to the agent on Slack. Claude auto-generates the code, then creates the PR. At that point accountability sits solely with the reviewer. Unless you set up particularly strict rules, one person can ask Claude to do something and then approve that PR: we’ve lost one of those pairs of eyes, and there's less shared context in the team as a result.

Reviewing PR isn't just about checking for bugs: it’s about sharing understanding of the code and the changes. Many companies don't do PRs at all and commit directly to the main branch, but the only way I've personally seen that work consistently at scale is if engineers are pairing constantly. That way you still have shared context about changes going in.

I'm not anti-progress, I'm anti-hype

I think it’s important to highlight at this stage that I am not, in fact, “anti-LLM”. I’m anti-the branding of it as “artificial intelligence”, because it’s not intelligent. It’s a form of machine learning. “Generative AI” is just a very good Markov chain that people expect far too much from.

I don’t even begrudge people using generative AI to generate prototypes. If you need to just quickly chuck together a wireframe or an interactive demo, it makes a lot of sense. My worry is more around people thinking they can “vibe code” their way to production-ready software, or hand off the actual thinking behind the coding.

Mikayla Maki had a particularly good take on working with agents: keep the human in the loop, treat them like an external contributor you don’t trust. Only use agents for tasks you already know how to do, because it’s vital that you understand it.

I will continue using my spicy autocomplete, but I’m not outsourcing my thinking any time soon. Stop generating, start understanding, and remember what we enjoyed about doing this in the first place.


Read the original article

Comments

  • By gkcnlr 2026-02-094:452 reply

    As long as AI (genAI, LLMs, whatever you call it describe the current tech) is perceived not as a "bicycle of the mind" and a tool to utilize 'your' skills to a next phase but as a commodity to be exploited by giant corporations whose existence is based on maximizing profits regardless of virtue or dignity (a basic set of ethics to, for example, not to burn books after you scan it feed your LLM like Anthropic), it is really hard to justify the current state of AI.

    Once you understand the sole winner in this hype is the one who'll be brutally scraping every bit of data, whether it's real-time or static and then refining it to give it back to you without your involvement in the process (a.k.a, learning) you'll come to understand that the current AI by nature is hugely unfavorable to mental progression...

    • By locknitpicker 2026-02-096:53

      > As long as AI (...) is perceived not as a "bicycle of the mind" and a tool to utilize 'your' skills to a next phase but as a commodity (...), it is really hard to justify the current state of AI.

      I don't agree at all. The "commodity" argument is actually a discussion on economic viability. This is the central discussion, and what will determine if tomorrow we will still have higher-quality and up-to-date LLMs available to us.

      You need to understand that nowadays there is a clear race to the bottom in LLM-related services, at a time when the vast majority is not economically viable. The whole AI industry is unsustainable at this point. Thus it's rather obvious that making a business case and generating revenue is a central point of discussion.

    • By red75prime 2026-02-096:321 reply

      > not to burn books after you scan it

      Shouldn't we blame copyright laws for that?

      • By jdub 2026-02-097:202 reply

        How would copyright law possibly compel the burning of books?

        • By red75prime 2026-02-097:56

          IANAL, I can only cite court decision: "And, the digitization of the books purchased in print form by Anthropic was also a fair use but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works, or redistributing existing copies."

        • By direwolf20 2026-02-0912:14

          You don't have to burn the book, but then you can't scan it, either.

          In some places there's an exception to copyright law for format shifting if you destroy the original. If you don't destroy the original, then you made a copy and that's not allowed.

  • By awesome_dude 2026-02-093:043 reply

    There's a couple of news stories doing the rounds at the moment which point to the fact that AI isn't "there yet"

    1. Microsoft's announcement of cutting their copilot products sales targets[0]

    2. Moltbook's security issues[1] after being "vibe coded" into life

    Leaving the undeniable conclusion to be - the vast majority (seriously) distrusts AI much more than we're led to believe, and with good reason.

    Thinking (as a SWE) is still very much the most important skill in SWE, and relying on AI has limitations.

    For me, AI is a great tool for helping me to discover ideas I had not previously thought of, and it's helpful for boilerplate, but it still requires me to understand what's being suggested, and, even, push back with my ideas.

    [0] https://arstechnica.com/ai/2025/12/microsoft-slashes-ai-sale...

    [1] https://www.reuters.com/legal/litigation/moltbook-social-med...

    • By henry_bone 2026-02-093:431 reply

      "Thinking (as a SWE) is still very much the most important skill in SWE, and relying on AI has limitations."

      I'd go further and say the thinking is humanity's fur and claws and teeth. It's our strong muscles. It's the only thing that has kept us alive in a natural world that would have us extinct long, long ago.

      But now we're building machine with the very purpose of thinking, or at least of producing the results of thinking. And we use it. Boy, do we use it. We use it to think of birthday presents (it's the thought that counts) and greeting card messages. We use it for education coursework (against the rules, but still). We use it, as programmers, to come up with solutions and to find bugs.

      If AI (of any stripe, LLM or some later invention) represents an existential threat, it is not because it will rise up and destroy us. Its threat lies solely in the fact that it is in our nature to take the path of least resistance. AI is the ultimate such path, and it does weaken our minds.

      My challenge to anyone who thinks it's harmless: use it for a while. Figure out what it's good at and lean on it. Then, after some months, or years, drop it and try working on your own like in the before times. I would bet that one will discover that significant amounts of fluency will be lost.

    • By bee_rider 2026-02-096:51

      It seems pretty hard to say at this point—we have people who say they get good results and have high standards. They don’t owe us any proof of course. But we don’t really have any way to validate that. Everybody thinks their code is good, right?

      Microsoft might just be having trouble selling copilot because Claude or whatever is better, right?

      Moltbook is insecure, but the first couple iterations of any non-trivial web service ends up having some crazy security hole. Also Moltbook seems to be some sort of… intentional statement of recklessness.

      I think we’ll only know in retrospect, if there’s a great die-off of the companies that don’t adopt these tools.

    • By copilot_king 2026-02-093:24

      [dead]

  • By acjohnson55 2026-02-093:395 reply

    I read this and thought, "are we using the same software?" For me, I have turned the corner where I barely hand-edit anything. Most of the tasks I take on are nearly one-shot successful, simply pointing Claude Code at a ticket URL. I feel like I'm barely scratching the surface of what's possible.

    I'm not saying this is perfect or unproblematic. Far from it. But I do think that shops that invest in this way of working are going to vastly outproduce ones that don't.

    LLMs are the first technology where everyone literally has a different experience. There are so many degrees of freedom in how you prompt. I actually believe that people's expectations and biases tend to correlate with the outcomes they experience. People who approach it with optimism will be more likely to problem-solve the speed bumps that pop up. And the speed bumps are often things that can mostly be addressed systemically, with tooling and configuration.

    • By written-beyond 2026-02-096:282 reply

      This only works if you don't look at the code.

      If all you're doing is reviewing behaviour and tests then yes almost 100% of the time if you're able to document the problem exact enough codex 5.3 will get it right.

      I had codex 5.3 write flawless svelte 5 code only because I had already written valid svelte 5 code around my code.

      The minute I started a new project and asked it to use svelte 5 and let it loose it not only started writing a weird mixture of svelte 3/4 + svelte 5 code but also straight up ignored tailwind and started writing it's own CSS.

      I asked it multiple times to update the syntax to svelte 5 but it couldn't figure it out. So I gave up and just accepted it, that's what I think is going to happen more frequently. If the code doesn't matter anymore and it's just the process of evaluating inputs and outputs then whatever.

      However if I need to implement a specific design I will 100% end up spending more time generating than writing it myself.

      • By acjohnson55 2026-02-096:461 reply

        I'm working in a very mature codebase on product features that are not technically unprecedented, which probably is determining a lot of my experience so far. Very possible that I'm experiencing a sweet spot.

        I can totally imagine that in greenfield, the LLM is going to explore huge search spaces. I can see that when observing the reasoning of these same models in non-coding contexts.

        • By written-beyond 2026-02-099:35

          That's exactly what I meant, when I've used LLMs on mature code bases it does very well because the code base was curated by engineers. When you have a greenfield project it's slop central, it's literally whatever the LLM has been trained on and the LLM can get to compile and run.

          Which is still okay, only until I have access to good and cheap LLMs.

      • By cadamsdotcom 2026-02-1520:09

        Why not have it research Svelte 5 syntax and write itself a syntax & function call checking tool, then put that in Husky / pre-commit so it gets run all the time.

        Then it will have the signal that is in your head, and you’ll never again need to tell it to use Svelte 5.

        As you see more “wrong” usage, have it add those to the checker, and the checker will keep getting better over time.

    • By woeirua 2026-02-094:355 reply

      This person is not using Claude Code or Cursor. They refuse to use the tools and have convinced themselves that they are right. Sadly, they won't recognize how wrong they were until they are unemployable.

      • By Spivak 2026-02-095:472 reply

        If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.

        I say this as someone who does use the tools, they're fine. I have yet to ever have an "it's perfect, no notes" result. If the bar is code that technically works along the happy path then fine, but that's the floor of what I'm willing to put forth or accept in a PR.

        • By acjohnson55 2026-02-096:261 reply

          > If Claude Code or Cursor is actually that good then we're all unemployed anyway. Using the tools won't save any of our jobs.

          There is absolutely reason for concern, but it's not inevitable.

          For the foreseeable future, I don't think we can simply Ralph Wiggum-loop real business problems. A lot of human oversight and tuning is required.

          Also, I haven't seen anything to suggest that AI is good at strategic business decisionmaking.

          I do think it dramatically changes the job of a software developer, though. We will be more like developers of software assembly lines and strategists.

          Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.

          > I have yet to ever have an "it's perfect, no notes" result.

          It frequently gets close for me, but usually some follow-up is needed. The ones that are closest to pure one-shot are bug fixes where replication can be captured in a regression test.

          • By jurgenburgen 2026-02-096:502 reply

            > Every company I have ever worked for has had a deep backlog of tasks and ideas we realistically were never going to get to. These tools put a lot of those tasks in play.

            Some of that backlog was never meant to be implemented. “Put it in the backlog” is a common way to deflect conflict over technical design and the backlog often becomes a graveyard of ideas. If I unleashed a brainless agent on our backlog the system would become a Frankenstein of incompatible design choices.

            An important part of management is to figure out what actually brings value instead of just letting teams build whatever they want.

            • By acjohnson55 2026-02-0915:18

              That's different form my experience. I've worked many places where there are loads of valuable ideas in the backlog or bugs that are real, but don't have enough impact to prioritize. But the business has limited resources, and there are higher value things on the roadmap.

              I'm experiencing the early stages of a reality where much more of this stuff is possible to build. I say early stages, because there's still plenty of friction between what we have now and a true productivity multiplier. But most of that friction is solvable without speculative improvements, like the models themselves getting better.

              If I worked someplace where there was nothing of value on the backlog, then I would be worried about my job.

            • By reverius42 2026-02-099:11

              You need to groom your backlog.

        • By locknitpicker 2026-02-097:36

          > If Claude Code or Cursor is actually that good then we're all unemployed anyway.

          I don't know about that. This PR stunt is a greenfield project that no one really knows what volume of work went behind it, and targeted a problem (bootstrapping a C compiler) that is actually quite small and relatively trivial to accomplish.

          Go ahead and google for small C compilers. They are a dime a dozen, and some don't venture beyond a couple thousand lines of code.

          Check out this past discussion.

          https://news.ycombinator.com/item?id=21210087

      • By acjohnson55 2026-02-095:55

        I was a huge skeptic on this stuff less than a year ago, so I get it. For a couple years, the hype was really hype, when it came to the actual business utility of AI tools. It's just interesting to me the extent to which people have totally different lived experiences right now.

        I do agree that some folks are in for rude awakening, because markets (labor and otherwise) will reveal winning strategies. I'm far from a free market ideologist, but this is a place where the logic seems to apply.

      • By girvo 2026-02-097:20

        To be totally fair to them... it is quite literally in the last few months that the tools have actually begun to meet the promises that the breathless hypers have been screeching about for years at this point.

        But it's also true that it simply is better than the OP is giving it credit for.

        Depressingly. Because I like writing code.

      • By tonyedgecombe 2026-02-0914:241 reply

        >Sadly, they won't recognize how wrong they were until they are unemployable.

        You never see job adverts requiring VIM or IntelliJ experience, I expect it will be the same for Claude Code or Cursor.

        • By wonger_ 2026-02-0919:21

          https://news.ycombinator.com/item?id=46857488

          > We are AI-native and expect you to use tools like Cursor and Claude to ship significantly faster.

          > Stack: Python 3.12 (Typed), FastAPI, MongoDB/Beanie, React/TS, Gemini/Claude, Claude Code,

          > Who you are: Strong software engineering background with TypeScript in production. Hands-on with AI coding tools (Cursor, Claude Code, Aider, Copilot)

      • By globular-toast 2026-02-097:01

        Hilarious take. There's absolutely no advantage to learning to use LLMs now. Even LLM "skills", if you can call it that, that you may have learnt 6 months ago are already irrelevant and obsolete. Do you really think a smart person couldn't get to your level in about an hour? You are not building fundamental skills and experience by using LLM agents now, you're just coasting and possibly even atrophying.

    • By whaleidk 2026-02-095:281 reply

      I am one of the ones who reviews code and pushes projects to the finish line for people who use AI like you. I hate it. The code is slop. You don’t realize because you aren’t looking close enough, but we do and it’s annoying

      • By acjohnson55 2026-02-096:141 reply

        I disagree with the characterization as "slop", if the tools are used well. There's no reason the user has to submit something that looks fundamentally different from what they would handwrite.

        You can't simply throw the generated code over the wall to the reviewer. You have to put in the work to understand what's being proposed and why.

        Lastly, an extremely important part of this is the improvement cycle. The tools will absolutely do suboptimal things sometimes, usually pretty similar to a human who isn't an expert in the codebase. Many people just accept what comes out. It's very important to identify the gaps between the first draft, what was submitted for code review, and the mergeable final product and use that information to improve the prompt architecture and automation.

        What I see is a tool that takes a lot of investment to pay off, but where the problems for operationalizing it are very tractable, and the opportunity is immense.

        I'm worried about many other aspects, but not the basic utility.

        • By whaleidk 2026-02-096:262 reply

          Here’s the thing, they say all the same things you just said in this comment. Yet, the code I end up having to work in is still bad. It’s 5x longer than it needs to be and the naming is usually bad so it takes way longer to read than human code. To top it off, very often it doesn’t integrate completely with the other systems and I have to rewrite a portion which takes longer because the code was designed to solve for a different problem.

          If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it. There’s no way you’re actually saving time if this is the case. I don’t buy that people are looking at it as deeply as they claim to be.

          • By acjohnson55 2026-02-096:381 reply

            > If you are really truly reviewing every single line in a way that it is the same as if you hand wrote it… just hand write it.

            I think this is only true for people who are already experts in the codebase. If you know it inside-out, sure, you can simply handwrite it. But if not, the code writing is a small portion of the work.

            I used to describe it as this task will take 2 days of code archaeology, but result in a +20/-20 change. Or much longer, if you are brand new to the codebase. This is where the AI systems excel, in my experience.

            If the output is +20/-20, then there's a pretty good chance it nailed the existing patterns. If it wrote a bunch more code, then it probably deserves deeper scrutiny.

            In my experience, the models are getting better and better at doing the right thing. But maybe this is also because I'm working in a codebase where there are many example patterns in the codebase to slot into and the entire team is investing heavily in the agent instructions and skills, and the tooling.

            • By whaleidk 2026-02-096:531 reply

              It may also have to do with the domain and language to some extent.

              Yes, the code archaeology is the time consuming part. I could use an LLM to do that for me in my co-workers generated code, but I don’t want to because when I have worked with AI I have found it to typically create overly-complex and uncreative solutions. I think there may be some confirmation bias with LLM coders where they look at the code and think it’s pretty good, so they think it’s basically the same way they would have written it themselves. But you miss a lot of experiences compared to when you’re actually in the code trenches reading, designing, and tinkering with code on your own. Like moving around functions to different modules and it suddenly hits you that there’s actually a conceptual shift you can make that allows you to code it all much simpler, or recalling that shareholder feedback from last week that — if worked with —could allow you a solution pathway that wasn’t viable with the current API design. I have also found that LLMs make assumptions about what parts of the code base can and can’t be changed, and they’re often inaccurate.

              • By acjohnson55 2026-02-0915:421 reply

                > But you miss a lot of experiences compared to when you’re actually in the code trenches reading, designing, and tinkering with code on your own.

                Completely agree. Working with this tooling is a fundamentally different practice.

                I'm not trying to suggest that agentic coding is superior in every way. I simply believe that in my own experience, the current gains exceed the drawbacks by a large margin for many applications, and that significantly higher gains are within close reach (e.g. weeks).

                I spent years in management, and it's not dissimilar to that transition. In my first role as a manager, I found it very difficult to divest myself of the need to have fine-grained knowledge of and control over the team's code. That doesn't scale. I had to learn to set people up for success and manage from a place of uncertainty. I had to learn to think like a risk manager instead of an artisan.

                I'll also say that when it comes to solution design, I have found it very helpful to ask the agent to give me options when it comes to solutions that look suboptimal. Often times, I can still find great refactor opportunities, and I can have agent draw up plans for those improvements and delegate them to parallel sessions, where the focus can be safely executing a feature-neutral refactor.

                Separately from that, I would note that the business doesn't always need us to be making conceptual shifts. Great business value can be delivered with suboptimal architecture.

                It is difficult to swallow, but I think that those of us whose market value is based on our ability to develop systems by manipulating code and getting feedback from the running product will find that businesses believe that machines can do this work more than good enough and at vastly higher scale.

                For the foreseeable future, there will be places where hands-on coding is superior, but I see that becoming more the exception than the norm, especially in product engineering.

                • By whaleidk 2026-02-0917:331 reply

                  Your perspective is quite thoughtful, thank you. I do agree that if you are just fixing a bug or updating function internals, +20/-20 is certainty good enough and I wouldn’t oppose AI used there.

                  I am going to have to agree to disagree overall though, because the second there is something the AI can’t do the maintenance time for a human to learn the context and solve the problem skyrockets (in practice, for me) in a way I find unacceptable. And I may be wrong, but I don’t see LLMs being able to improve to close that gap soon because that would require a fundamental shift away from what the LLMs are under the hood.

                  • By acjohnson55 2026-02-0918:39

                    This was a really interesting conversation, and I learned a lot from your thoughts and everyone else's on this thread.

                    As I said up top:

                    > LLMs are the first technology where everyone literally has a different experience.

                    I totally believe you when you say that you have not found these tools to be net useful. I suspect our different perceptions probably come from a whole bunch of things that are hard to transmit over a discussion like this. And maybe factors we're not even aware of -- I am benefiting from a lot of investment my company has made into all of the harness around this.

                    But I do pretty strongly believe that I'm not hallucinating how well it's all working in my specific context.

    • By __mharrison__ 2026-02-095:551 reply

      I'm reading all these articles and having the same thought. These folks aren't using the same tools I'm using.

      • By the-grump 2026-02-096:032 reply

        I feel so weird not being the grumpy one for once.

        Can't relate to GP's experience of one-shotting. I need to try a couple of times and really hone in on the right plan and constraints.

        But I am getting so much done. My todo list used to grow every year. Now it shrinks every month.

        And this is not mindless "vibe coding". I insist on what I deploy being quality, and I use every tool I can that can help me achieve that (languages with strong types, TDD with tests that specify system behaviour, E2E tests where possible).

        • By acjohnson55 2026-02-1017:05

          I regret using the term "one-shot", because my reality isn't really that. It's more that the first shot gets the code 80-90% of the way there, usually, and it short-circuits a ton of the "code archaeology" I would normally have to do to get to that point.

          Some bugs really can be one-shotted, but that's with the benefit of a lot of scaffolding our company has built and the prompting process. It's not as simple as Claude Code being able to do this out of the box.

        • By all2 2026-02-096:142 reply

          I'm on my 5th draft of an essentially vibe-coded project. Maybe its because I'm using not-frontier models to do the coding, but I have to take two or three tries to get the shape of a thing just right. Drafting like this is something I do when I code by hand, as well. I have to implement a thing a few times before I begin to understand the domain I'm working in. Once I begin to understand the domain, the separation of concerns follows naturally, and so do the component APIs (and how those APIs hook together).

          • By the-grump 2026-02-097:133 reply

            My suggestions:

            - like the sister comment says, use the best model available. For me that has been opus but YMMV. Some of my colleagues prefer the OAI models.

            - iterate on the plan until it looks solid. This is where you should invest your time.

            - Watch the model closely and make sure it writes tests first, checks that they fail, and only then proceeds to implementation

            - the model should add pieces one by one, ensuring each step works before proceeding. Commit each step so you can easily retry if you need to. Each addition will involve a new plan that you go back and forth on until you're happy with it. The planning usually gets easier as the project moves along.

            - this is sometimes controversial, but use the best language you can target. That can be Rust, Haskell, Erlang depending on the context. Strong types will make a big difference. They catch silly mistakes models are liable to make.

            Cursor is great for trying out the different models. If opus is what you like, I have found Claude code to be better value, and personally I prefer the CLI to the vscode UI cursor builds on. It's not a panacea though. The CLI has its own issues like occasionally slowing to a crawl. It still gets the work done.

            • By all2 2026-02-1022:52

              My options are 1) pay about a dollar per query from a frontier model, or 2) pay a fraction of that for a not-so-great model that makes my token spend last days/weeks instead of hours.

              I spend a lot of time on plans, but unfortunately the gotchas are in the weeds, especially when it comes to complex systems. I don't trust these models with even marginally complex, non-standard architectures (my projects center around statecharts right now, and the semantics around those can get hairy).

              I git commit after each feature/bugfix, so we're on the same page here. If a feature is too big, or is made up of more than one "big" change, I chunk up the work and commit in small batches until the feature is complete.

              I'm running golang for my projects right now. I can try a more strongly typed language, but that means learning a whole new language and its gotchas and architectural constraints.

              Right now I use claude-code-router and Claude Code on top of openrouter, so swapping models is trivial. I use mostly Grok-4.1 Fast or Kimi 2.5. Both of these choke less than Anthropic's own Sonnet (which is still more expensive than the two alternatives).

            • By girvo 2026-02-097:231 reply

              > and personally I prefer the CLI to the vscode UI cursor builds on

              So do I, but I also quite like Cursor's harness/approach to things.

              Which is why their `agent` CLI is so handy! You can use cursor in any IDE/system now, exactly like claude code/codex cli

              • By the-grump 2026-02-097:29

                I tried it when it first came out and it was lacking then. Perhaps it's better now--will give it a shot when I sign up for cursor again.

                Thank you for sharing that!

            • By chrispyfried 2026-02-0915:221 reply

              When you say “iterate on the plan” are you suggesting to do that with the AI or on your own? For the former, have any tips/patterns to suggest?

              • By the-grump 2026-02-104:03

                With the AI. I read the whole thing and correct the model where it makes mistakes, fill the gaps where I find them.

                I also always check that it explicitly states my rules (some from the global rules, some from the session up until that moment) so they're followed at implementation time.

                In my experience opus is great at understanding what you want and putting it in a plan, and it's also great at sticking to the plan. So just read through the entire thing and make sure it's a plan that you feel confident about.

                There will be some trial and error before you notice the kind of things the model gets wrong, and that will guide what you look for in the plan that it spits out.

          • By mistercow 2026-02-097:01

            > Maybe its because I'm using not-frontier models to do the coding

            IMO it’s probably that. The difference between where this was a a year ago and now is night and day, and not using frontier models is roughly like stepping back in time 6-12 months.

HackerNews