Opus 4.5 is not the normal AI agent experience that I have had thus far

2026-01-0617:458791353burkeholland.github.io

Three months ago I would have dismissed claims that AI could replace developers. Today, after using Claude Opus 4.5, I believe AI coding agents can absolutely replace developers.

Show article

This app also uses Firebase. Again, Opus one-shotted the Google auth email integration. This is the kind of thing that is painstakingly miserable by hand. And again, Firebase is so well suited here because Opus knows how to use the Firebase CLI so well. It needs zero instruction.

BUT YOU DON’T KNOW HOW THE CODE WORKS

No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled. Especially since I don’t know Swift at all.

This used to be a major hangup for me. I couldn’t diagnose problems when things went sideways. With Opus 4.5, I haven’t hit that wall yet—Opus always figures out what the issue is and fixes its own bugs.

The real question is code quality. Without understanding how it’s built, how do I know if there’s duplication, dead code, or poor patterns? I used to obsess over this. Now I’m less worried that a human needs to read the code, because I’m genuinely not sure that they do.

Why does a human need to read this code at all? I use a custom agent in VS Code that tells Opus to write code for LLMs, not humans. Think about it—why optimize for human readability when the AI is doing all the work and will explain things to you when you ask?

What you don’t need: variable names, formatting, comments meant for humans, or patterns designed to spare your brain.

What you do need: simple entry points, explicit code with fewer abstractions, minimal coupling, and linear control flow.

Here’s my custom agent prompt:

You are an AI-first software engineer. Assume all code will be written and maintained by LLMs, not humans. Optimize for model reasoning, regeneration, and debugging — not human aesthetics. These coding principles are mandatory:

1. Structure
- Use a consistent, predictable project layout.
- Group code by feature/screen; keep shared utilities minimal.
- Create simple, obvious entry points.
- Before scaffolding multiple files, identify shared structure first. Use framework-native composition patterns (layouts, base templates, providers, shared components) for elements that appear across pages. Duplication that requires the same fix in multiple places is a code smell, not a pattern to preserve.

2. Architecture
- Prefer flat, explicit code over abstractions or deep hierarchies.
- Avoid clever patterns, metaprogramming, and unnecessary indirection.
- Minimize coupling so files can be safely regenerated.

3. Functions and Modules
- Keep control flow linear and simple.
- Use small-to-medium functions; avoid deeply nested logic.
- Pass state explicitly; avoid globals.

4. Naming and Comments
- Use descriptive-but-simple names.
- Comment only to note invariants, assumptions, or external requirements.

5. Logging and Errors
- Emit detailed, structured logs at key boundaries.
- Make errors explicit and informative.

6. Regenerability
- Write code so any file/module can be rewritten from scratch without breaking the system.
- Prefer clear, declarative configuration (JSON/YAML/etc.).

7. Platform Use
- Use platform conventions directly and simply (e.g., WinUI/WPF) without over-abstracting.

8. Modifications
- When extending/refactoring, follow existing patterns.
- Prefer full-file rewrites over micro-edits unless told otherwise.

9. Quality
- Favor deterministic, testable behavior.
- Keep tests simple and focused on verifying observable behavior.

Your goal: produce code that is predictable, debuggable, and easy for future LLMs to rewrite or extend.

All of that said, I don’t have any proof that this prompt makes a difference. I find that Opus 4.5 writes pretty solid code no matter what you prompt it with. However, because models like to write code WAY more than they like to delete it, I will at points run a prompt that looks something like this…

Check your LLM, AI coding principles and then do a comprehensive search of this application and suggest what we can do to refactor this to better align to those principles. Also point out any code that can be deleted, any files that can be deleted, things that should read should be renamed, things that should be restructured. Then do a write up of what that looks like. Kind of keep it high level so that it's easy for me to read and not too complex. Add sections for high, medium and lower priority And if something doesn't need to be changed, then don't change it. You don't need to change things just for the sake of changing them. You only need to change them if it helps better align to your LLM AI coding principles. Save to a markdown file.

And you get a document that has high, medium and low priority items. The high ones you can deal with and the AI will stop finding them. You can refactor your project a million times and it will keep finding medium/low priority refactors that you can do. An AI is never ever going to pass on the opportunity to generate some text.

I use a similar prompt to find security issues. These you have to be very careful about. Where are the API keys? Is login handled correctly? Are you storing sensitive values in the database? This is probably the most manual part of the project and frankly, something that makes me the most nervous about all of these apps at the moment. I’m not 100% confident that they are bullet proof. Maybe like 80%. And that, as they say, is too damn low.

Times they are A-changin

I don’t know if I feel exhilarated by what I can now build in a matter of hours, or depressed because the thing I’ve spent my life learning to do is now trivial for a computer. Both are true.

I understand if this post made you angry. I get it - I didn’t like it either when people said “AI is going to replace developers.” But I can’t dismiss it anymore. I can wish it weren’t true, but wishing doesn’t change reality.

But for everything else? Build. Stop waiting to have all the answers. Stop trying to figure out your place in an AI-first world. The answer is the same as it always was: make things. And now you can make them faster than you ever thought possible.

Just make sure you know where your API keys are.

Disclaimer: This post was written by a human and edited for spelling, grammer by Haiku 4.5

Read the original article

Comments

By OldGreenYodaGPT 2026-01-0618:1333 reply

Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.

Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.

For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.

We also had Claude Code create a bunch of ESLint automation, including custom ESLint rules and lint checks that catch and auto-handle a lot of stuff before it even hits review.

Then we take it further: we have a deep code review agent Claude Code runs after changes are made. And when a PR goes up, we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it.

On top of that, we’ve got like five other Claude Code GitHub workflow agents that run on a schedule. One of them reads all commits from the last month and makes sure docs are still aligned. Another checks for gaps in end-to-end coverage. Stuff like that. A ton of maintenance and quality work is just… automated. It runs ridiculously smoothly.

We even use Claude Code for ticket triage. It reads the ticket, digs into the codebase, and leaves a comment with what it thinks should be done. So when an engineer picks it up, they’re basically starting halfway through already.

There is so much low-hanging fruit here that it honestly blows my mind people aren’t all over it. 2026 is going to be a wake-up call.

(used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

Edit: made an example repo for ya

https://github.com/ChrisWiles/claude-code-showcase

By klaussilveira 2026-01-070:3812 reply

I made a similar comment on a different thread, but I think it also fits here: I think the disconnect between engineers is due to their own context. If you work with frontend applications, specially React/React Native/HTML/Mobile, your experience with LLMs is completely different than the experience of someone working with OpenGL, io_uring, libev and other lower level stuff. Sure, Opus 4.5 can one shot Windows utilities and full stack apps, but can't implement a simple shadowing algorithm from a 2003 paper in C++, GLFW, GLAD: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf

Codex/Claude Code are terrible with C++. It also can't do Rust really well, once you get to the meat of it. Not sure why that is, but they just spit out nonsense that creates more work than it helps me. It also can't one shot anything complete, even though I might feed him the entire paper that explains what the algorithm is supposed to do.

Try to do some OpenGL or Vulkan with it, without using WebGPU or three.js. Try it with real code, that all of us have to deal with every day. SDL, Vulkan RHI, NVRHI. Very frustrating.

Try it with boost, or cmake, or taskflow. It loses itself constantly, hallucinates which version it is working on and ignores you when you provide actual pointers to documentation on the repo.

I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php

We are not sleeping on it, we are actually waiting for it to get actually useful. Sure, it can generate a full stack admin control panel in JS for my PostgreSQL tables, but is that really "not normal"? That's basic.

By JDye 2026-01-0715:564 reply

We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level.

With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected.

I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.

By dpc_01234 2026-01-0718:332 reply

> We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside

I have a great time using Claude Code in Rust projects, so I know it's not about the language exactly.

My working model is is that since LLM are basically inference/correlation based, the more you deviate from the mainstream corpus of training data, the more confused LLM gets. Because LLM doesn't "understand" anything. But if it was trained on a lot of things kind of like the problem, it can match the patterns just fine, and it can generalize over a lot layers, including programming languages.

Also I've noticed that it can get confused about stupid stuff. E.g. I had two different things named kind of the same in two parts of the codebase, and it would constantly stumble on conflating them. Changing the name in the codebase immediately improved it.

So yeah, we've got another potentially powerful tool that requires understanding how it works under the hood to be useful. Kind of like git.

By lisperforlife 2026-01-0720:381 reply

Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.

By dpc_01234 2026-01-085:41

Well, it's not like it never happened to me to "burn tokens" with some lifetime issue. :D But yeah, if you're working in Rust on something with sharp edges, LLM will get get hurt. I just don't tend to have these in my projects.

Even more basic failure mode. I told it to convert/copy a bit (1k LOC) of blocking code into a new module and convert to async. It just couldn't do a proper 1:1 logical _copy_. But when I manually `cp <src> <dst>` the file and then told it to convert that to async and fix issues, it did it 100% correct. Because fundamentally it's just non-deterministic pattern generator.

By fullstackchris 2026-01-1519:55

hot take (that shouldn't be?): if your code is super easy to follow as a human, it will be super easy to follow for an LLM. (hint: guess where the training data is coming from!)

By kevin42 2026-01-0718:051 reply

This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc.

For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.

By ryandrake 2026-01-0718:152 reply

Is there any good documentation out there about how to perform this wizardry? I always assumed if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code. If there are extra steps that need to be done, why don't Claude's developers just add those extra steps to /init?

By kevin42 2026-01-0718:273 reply

Not that I have seen, which is probably a big part of the disconnect. Mostly it's tribal knowledge. I learned through experimentation, but I've seen tips here and there. Here's my workflow (roughly)

> Create a CLAUDE.md for a c++ application that uses libraries x/y/z

[Then I edit it, adding general information about the architecture]

> Analyze the library in the xxx directory, and produce a xxx_architecture.md describing the major components and design

> /agent [let claude make the agent, but when it asks what you want it to do, explain that you want it to specialize in subsystem xxx, and refer to xxx_architecture.md

Then repeat until you have the major components covered. Then:

> Using the files named with architecture.md analyze the entire system and update CLAUDE.md to use refer to them and use the specialized agents.

Now, when you need to do something, put it in planning mode and say something like:

> There's a bug in the xxx part of the application, where when I do yyy, it does zzz, but it should do aaa. Analyze the problem and come up with a plan to fix it, and automated tests you can perform if possible.

Then, iterate on the plan with it if you need to, or just approve it.

One of the most important things you can do when dealing with something complex is let it come up with a test case so it can fix or implement something and then iterate until it's done. I had an image processing problem and I gave it some sample data, then it iterated (looking at the output image) until it fixed it. It spent at least an hour, but I didn't have to touch it while it worked.

By JDye 2026-01-0912:35

I've taken time today to do this. With some of your suggestions, I am seeing an improvement in it's ability to do some of the grunt work I mentioned. It just saved me an hour refactoring a large protocol implementation into a few files and extracted some common utilities. I can recognise and appreciate how useful that is for me and for most other devs.

At the same time, I think there's limitations to these tools and that I wont ever be able to achieve what I see others saying about 95% of code being AI written or leaving the AI to iterate for an hour. There's just too many weird little pitfalls in our work that the AI just cannot seem to avoid.

It's understandable, I've fallen victim to a few of them too, but I have the benefit of the ability to continuously learn/develop/extrapolate in a way that the LLM cannot. And with how little documentation exists for some of these things (MASQUE proxying for example) anytime the LLM encounters this code it throws a fit, and is unable to contribute meaningfully.

So thanks for your suggestions, it has made Claude better and clearly I was dragging my feet a little. At the very least, it's freed up a some more of my time to work on the complex things Claude can't do.

By ryandrake 2026-01-0718:313 reply

To be perfectly honest, I've never used a single /command besides /init. That probably means I'm using 1% of the software's capabilities. In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

By kevin42 2026-01-0718:471 reply

You don't need to do much, the /agent command is the most useful, and it walks you through it. The main thing though is to give the agent something to work with before you create it. That's why I go through the steps of letting Claude analyze different components and document the design/architecture.

The major benefit of agents is that it keeps context clean for the main job. So the agent might have a huge context working through some specific code, but the main process can do something to the effect of "Hey UI library agent, where do I need to put code to change the color of widget xyz", then the agent does all the thinking and can reply with "that's in file 123.js, line 200". The cleaner you keep the main context, the better it works.

By theshrike79 2026-01-0719:49

Never thought of Agents in that way to be honest. I think I need to try that style =)

By theshrike79 2026-01-0719:48

/commands are like macros or mayyybe aliases. You just put in the commands you see yourself repeating often, like "commit the unstaged files in distinct commits, use xxx style for the commit messages..." - then you can iterate on it if you see any gaps or confusion, even give example commands to use in the different steps.

Skills on the other hand are commands ON STEROIDS. They can be packaged with actual scripts and executables, the PEP723 Python style + uv is super useful.

I have one skill for example that uses Python+Treesitter to check the unit thest quality of a Go project. It does some AST magic to check the code for repetition, stupid things like sleeps and relative timestamps etc. A /command _can_ do it, but it's not as efficient, the scripts for the skill are specifically designed for LLM use and output the result in a hyper-compact form a human could never be arsed to read.

By gck1 2026-01-0720:35

> In frankness, the whole menu of /-commands is intimidating and I don't know where to start.

claude-code has a built in plugin that it can use to fetch its own docs! You don't have to ever touch anything yourself, it can add the features to itself, by itself.

By gck1 2026-01-0720:31

This is some great advice. What I would add is to avoid the internal plan mode and just build your own. Built in one creates md files outside the project, gives the files random names and its hard to reference in the future.

It's also hard to steer the plan mode or have it remember some behavior that you want to enforce. It's much better to create a custom command with custom instructions that acts as the plan mode.

My system works like this:

/implement command acts as an orchestrator & plan mode, and it is instructed to launch predefined set of agents based on the problem and have them utilize specific skills. Every time /implement command is initiated, it has to create markdown file inside my own project, and then each subagent is also instructed to update the file when it finished working.

This way, orchestrator can spot that agent misbehaved, and reviewer agent can see what developer agent tried to do and why it was wrong.

By HDThoreaun 2026-01-0718:49

> if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code.

This is definitely not the case, and the reason anthropic doesnt make claude do this is because its quality degrades massively as you use up its context. So the solution is to let users manage the context themselves in order to minimize the amount that is "wasted" on prep work. Context windows have been increasing quite a bit so I suspect that by 2030 this will no longer be an issue for any but the largest codebases, but for now you need to be strategic.

By turkey99 2026-01-0720:33

Are you still talking about Opus 4.5 I’ve been working on a Rust, kotlin and c++ and it’s been doing well. Incredible at C++, like the number of mistakes it doesn’t make

By parliament32 2026-01-0718:115 reply

> I still believe those who rave about it are not writing anything I would consider "engineering".

Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators.

The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word.

By woah 2026-01-0719:041 reply

> Engineering is.. novel, for lack of a better word.

Tell that to the guys drawing up the world's 10 millionth cable suspension bridge

By drysine 2026-01-0719:54

Actually, 10000th

https://www.bridgemeister.com/fulllist.htm

By ryandrake 2026-01-0718:191 reply

I don't think it's that helpful to try to gatekeep the "engineering" term or try to separate it into "pure" and "impure" buckets, implying that one is lesser than the other. It should be enough to just say that AI assisted development is much better at non-novel tasks than it is at novel tasks. Which makes sense: LLMs are trained on existing work, and can't do anything novel because if it was trained on a task, that task is by definition not novel.

By parliament32 2026-01-0718:341 reply

Respectfully, it's absolutely important to "gatekeep" a title that has an established definition and certain expectations attached to the title.

OP says, "BUT YOU DON’T KNOW HOW THE CODE WORKS.. No I don’t. I have a vague idea, but you are right - I do not know how the applications are actually assembled." This is not what I would call an engineer. Or a programmer. "Prompter", at best.

And yes, this is absolutely "lesser than", just like a middleman who subcontracts his work to Fiverr (and has no understanding of the actual work) is "lesser than" an actual developer.

By emodendroket 2026-01-0718:481 reply

That's not the point being made to you. The point is that most people in the "software engineering" space are applying known tools and techniques to problems that are not groundbreaking. Very few are doing theoretical computer science, algorithm design, or whatever you think it is that should be called "engineering."

By windexh8er 2026-01-084:471 reply

So the TL;DR here is... If you're in the business of recreating wheels - then you're in luck! We've automated wheel recreation to an acceptable degree of those wheels being true.

By emodendroket 2026-01-0921:28

Most physical engineers are just applying known techniques all the time too. Most products or bridges or whatever are not solving some heretofore-unsolved problem.

By scottyah 2026-01-0720:341 reply

It's how you use the tool that matters. Some people get bitter and try to compare it to top engineers' work on novel things as a strawman so they can go "Hah! Look how it failed!" as they swing a hammer to demonstrate it cannot chop down a tree. Because the tool is so novel and it's use us a lot more abstract than that of an axe, it is taking awhile for some to see its potential, especially if they are remembering models from even six months ago.

Engineering is just problem solving, nobody judges structural engineers for designing structures with another Simpson Strong Tie/No.2 Pine 2x4 combo because that is just another easy (and therefore cheap) way to rapidly get to the desired state. If your client/company want to pay for art, that's great! Most just want the thing done fast and robustly.

By wolvoleo 2026-01-0816:33

I think it's also that the potential is far from being realized yet we're constantly bombarded by braindead marketers trying to convince us that it's the best thing ever already. This is tiring especially when the leadership (not held back by any technical knowledge) believes them.

I'm sure AI will get there, I also think it's not very good yet.

By loandbehold 2026-01-080:27

Coding agents as of Jan 2026 are great at what 95% of software engineers do. For remaining 5% that do really novel stuff -- the agents will get there in a few years.

By 3oil3 2026-01-086:32

When he said 'just look at what I'v been able to build', I was expecting anything but an 'image converter'

By wild_egg 2026-01-070:461 reply

I've had Opus 4.5 hand rolling CUDA kernels and writing a custom event loop on io_uring lately and both were done really well. Need to set up the right feedback loops so it can test its work thoroughly but then it flies.

By jaggederest 2026-01-071:021 reply

Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.

By jonstewart 2026-01-071:443 reply

It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.

By rtfeldman 2026-01-074:054 reply

Anecdotally, we use Opus 4.5 constantly on Zed's code base, which is almost a million lines of Rust code and has over 150K active users, and we use it for basically every task you can think of - new features, bug fixes, refactors, prototypes, you name it. The code base is a complex native GUI with no Web tech anywhere in it.

I'm not talking about "write this function" but rather like implementing the whole feature by writing only English to the agent, over the course of numerous back-and-forth interactions and exhausting multiple 200K-token context windows.

For me personally, definitely at least 99% all of the Rust code I've committed at work since Opus 4.5 came out has been from an agent running that model. I'm reading lots of Rust code (that Opus generated) but I'm essentially no longer writing any of it. If dot-autocomplete (and LLM autocomplete) disappeared from IDE existence, I would not notice.

By mr_o47 2026-01-0718:561 reply

Woah that's a very interesting claim you made I was shying away from writing Rust as I am not a Rust developer but hearing from your experience looks like claude has gotten very good at writing Rust

By jaggederest 2026-01-0722:44

Honestly I think the more you can give Claude a type system and effective tests, the more effective it can be. Rust is quite high up on the test strictness front (though I think more could be done...), so it's a great candidate. I also like it's performance on Haskell and Go, both get you pretty great code out of the box.

By norir 2026-01-0716:302 reply

Have you ever worried that by programming in this way, you are methodically giving Anthropic all the information it needs to copy your product? If there is any real value in what you are doing, what is to stop Anthropic or OpenAI or whomever from essentially one-shotting Zed? What happens when the model providers 10x their costs and also use the information you've so enthusiastically given them to clone your product and use the money that you paid them to squash you?

By rtfeldman 2026-01-0716:31

Zed's entire code base is already open source, so Anthropic has a much more straightforward way to see our code:

https://github.com/zed-industries/zed

By kaydub 2026-01-0717:461 reply

That's what things like AWS bedrock are for.

Are you worried about microsoft stealing your codebase from github?

By djhn 2026-01-0721:18

Isn’t it widely assumed Microsoft used private repos for LLM training?

And even with a narrower definition of stealing, Microsoft’s ability to share your code with US government agencies is a common and very legitimate worry in plenty of threat model scenarios.

By ziml77 2026-01-0722:561 reply

I just uninstalled Zed today when I realized the reason I couldn't delete a file on Windows because it was open in Zed. So I wouldn't speak too highly of the LLM's ability to write code. I have never seen another editor on Windows make the mistake of opening files without enabling all 3 share modes.

By rtfeldman 2026-01-0918:05

Just based on timing, I am almost 100% sure whatever code is responsible was handwritten before anyone working on Windows was using LLMs...but anyway, thank you for the bug report - I'll pass it along!

By Snuggly73 2026-01-0715:542 reply

The article is arguing that it will basically replace devs. Do you think it can replace you basically one-shotting features/bugs in Zed?

And also - doesn’t that make Zed (and other editors) pointless?

By kevin42 2026-01-0718:111 reply

Trying to one-shot large codebases is a exercise in futility. You need to let Claude figure out and document the architecture first, then setup agents for each major part of the project. Doing this keeps the context clean for the main agent, since it doesn't have to go read the code each time. So one agent can fill it's entire context understanding part of the code and then the main agent asks it how to do something and gets a shorter response.

It takes more work than one-shot, but not a lot, and it pays dividends.

By dpark 2026-01-0720:52

Is there a guide for doing that successfully somewhere? I would love to play with this on a large codebase. I would also love to not reinvent the wheel on getting Claude working effectively on a large code base. I don’t even know where to start with, e.g., setting up agents for each part.

By rtfeldman 2026-01-0716:391 reply

> Do you think it can replace you basically one-shotting features/bugs in Zed?

Nobody is one-shotting anything nontrivial in Zed's code base, with Opus 4.5 or any other model.

What about a future model? Literally nobody knows. Forecasts about AI capabilities have had horrendously low accuracy in both directions - e.g. most people underestimated what LLMs would be capable of today, and almost everyone who thought AI would at least be where it is today...instead overestimated and predicted we'd have AGI or even superintelligence by now. I see zero signs of that forecasting accuracy improving. In aggregate, we are atrocious at it.

The only safe bet is that hardware will be faster and cheaper (because the most reliable trend in the history of computing has been that hardware gets faster and cheaper), which will naturally affect the software running on it.

> And also - doesn’t that make Zed (and other editors) pointless?

It means there's now demand for supporting use cases that didn't exist until recently, which comes with the territory of building a product for technologists! :)

By Snuggly73 2026-01-0716:491 reply

Thanx. More of a "faster keyboard" so far then?

And yeah - if I had a crystal ball, I would be on my private island instead of hanging on HN :)

By rtfeldman 2026-01-0716:56

Definitely more than a faster keyboard (e.g. I also ask the model to track down the source of a bug, or questions about the state of the code base after others have changed it, bounce architectural ideas off the model, research, etc.) but also definitely not a replacement for thinking or programming expertise.

By jaggederest 2026-01-071:531 reply

I don't know if you've tried Chatgpt-5.2 but I find codex much better for Rust mostly due to the underlying model. You have to do planning and provide context, but 80%+ of the time it's a oneshot for small-to-medium size features in an existing codebase that's fairly complex. I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job.

If you have any opensource examples of your codebase, prompt, and/or output, I would happily learn from it / give advice. I think we're all still figuring it out.

Also this SIMD translation wasn't just a single function - it was multiple functions across a whole region of the codebase dealing with video and frame capture, so pretty substantial.

By glhaynes 2026-01-0714:49

"I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job."

That's a good way to say it, I totally identify.

By andai 2026-01-0717:46

Is that a context issue? I wonder if LSP would help there. Though Claude Code should grep the codebase for all necessary context and LSP should in theory only save time, I think there would be a real improvement to outcomes as well.

The bigger a project gets the more context you generally need to understand any particular part. And by default Claude Code doesn't inject context, you need to use 3rd party integrations for that.

By lelandfe 2026-01-074:282 reply

I'm a quite senior frontend using React and even I see Sonnet 4.5 struggle with basic things. Today it wrote my Zod validation incorrectly, mixing up versions, then just decided it wasn't working and attempted to replace the entire thing with a different library.

By baq 2026-01-076:471 reply

There’s little reason to use sonnet anymore. Haiku for summaries, opus for anything else. Sonnet isn’t a good model by today’s standards.

By lelandfe 2026-01-0920:25

I have been chastened in the opposite direction by others. I've also subjectively really disliked Opus's speed and I've seen Opus do really silly things too. But I'll try out using it as a daily driver and see if I like it more.

By subomi 2026-01-075:562 reply

Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.

My rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too.

By puttycat 2026-01-0710:261 reply

> Engineers write bugs all the time

Why do we hold calculators to such high bars? Humans make calculation mistakes all the time.

Why do we hold banking software to such high bars? People forget where they put their change all the time.

Etc etc.

By Der_Einzige 2026-01-0717:011 reply

I don't hold calculators to high bars. They think 0.1 + 0.2 = 0.30000000000000004:

https://qntm.org/notpointthree

By recursive 2026-01-0718:32

Some of them. The good ones don't.

By lelandfe 2026-01-076:24

my unrealistic bar lies somewhere above "pick a new library" bug resolution

By CapsAdmin 2026-01-079:41

I built an open to "game engine" entirely in Lua a many years ago, but relying on many third party libraries that I would bind to with FFI.

I thought I'd revive it, but this time with Vulkan and no third-party dependencies (except for Vulkan)

4.5 Sonet, Opus and Gemini 3.5 flash has helped me write image decoders for dds, png jpg, exr, a wayland window implementation, macOS window implementation, etc.

I find that Gemini 3.5 flash is really good at understanding 3d in general while sonnet might be lacking a little.

All these sota models seem to understand my bespoke Lua framework and the right level of abstraction. For example at the low level you have the generated Vulkan bindings, then after that you have objects around Vulkan types, then finally a high level pipeline builder and whatnot which does not mention Vulkan anywhere.

However with a larger C# codebase at work, they really struggle. My theory is that there are too many files and abstractions so that they cannot understand where to begin looking.

By 3D30497420 2026-01-0714:462 reply

I'll second this. I'm making a fairly basic iOS/Swift app with an accompanying React-based site. I was able to vibe-code the React site (it isn't pretty, but it works and the code is fairly decent). But I've struggled to get the Swift code to be reliable.

Which makes sense. I'm sure there's lots of training data for React/HTML/CSS/etc. but much less with Swift, especially the newer versions.

By rootusrootus 2026-01-0718:43

I had surprising success vibe coding a swift iOS app a while back. Just for fun, since I have a bluetooth OBD2 dongle and an electric truck, I told Claude to make me an app that could connect to the truck using the dongle, read me the VIN, odometer, and state of charge. This was middle of 2025, so before Opus 4.5. It took Claude a few attempts and some feedback on what was failing, but it did eventually make a working app after a couple hours.

Now, was the code quality any good? Beats me, I am not a swift developer. I did it partly as an experiment to see what Claude was currently capable of and partly because I wanted to test the feasibility of setting up a simple passive data logger for my truck.

I'm tempted to take another swing with Opus 4.5 for the science.

By billbrown 2026-01-0719:571 reply

I hate "vibe code" as a verb. May I suggest "prompt" instead? "I was able to prompt the React site…."

By bigDinosaur 2026-01-087:20

You aren't prompting the React site, you're prompting the LLM.

By 348512469721 2026-01-0819:331 reply

> It also can't do rust really well

I have not had this experience at all. It often doesn't get it right on the first pass, yes, but the advantage with Rust vibecoding is that if you give it a rule to "Always run cargo check before you think you're finished" then it will go back and fix whatever it missed on the first pass. What I find particularly valuable is that the compiler forces it to handle all cases like match arms or errors. I find that it often misses edge cases when writing typescript, and I believe that the relative leniency of the typescript compiler is why.

In a similar vein, it is quite good at writing macros (or at least, quite good given how difficult this otherwise is). You often have to cajole it into not hardcoding features into the macro, but since macros resolve at compile time they're quite well-suited for an LLM workflow as most potential bugs will be apparent before the user needs to test. I also think that the biggest hurdle of writing macros to humans is the cryptic compiler errors, but I can imagine that since LLMs have a lot of information about compilers and syntax parsing in their training corpus, they have an easier time with this than the median programmer. I'm sure an actual compiler engineer would be far better than the LLM, but I am not that guy (nor can I afford one) so I'm quite happy to use LLMs here.

For context, I am purely a webdev. I can't speak for how well LLMs fare at anything other than writing SQL, hooking up to REST APIs, React frontend, and macros. With the exception of macros, these are all problems that have been solved a million times thus are more boilerplate than novelty, so I think it is entirely plausible that they're very poor for different domains of programming despite my experiences with them.

By jessoteric 2026-01-0820:50

i've also been using opus 4.5 with lots of heavy rust development. i don't "vibe code", but lead it with a relatively firm hand- and it produces pretty good results in surprisingly complicated tasks.

for example, one of our public repos works with rust compiler artifacts and cache restoration (https://github.com/attunehq/hurry); if you look at the history you can see it do some pretty surprisingly complex (and well made, for an LLM) changes. its code isn't necessarily what i would always write, or the best way to solve the problem, but it's usually perfectly serviceable if you give it enough context and guidance.

By UncleOxidant 2026-01-075:531 reply

I've had pretty good luck with LLM agents coding C. In this case a C compiler that supports a subset of C and targets a customizable microcoded state machine/processor. Then I had Gemini code up a simulator/debugger for the target machine in C++ and it did it in short order and quite successfully - lets you single step through the microcode and examine inputs (and set inputs), outputs & current state - did that in an afternoon and the resulting C++ code looks pretty decent.

By HarHarVeryFunny 2026-01-0715:441 reply

That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.

I've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.

I already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has "code interpreter" support for Python, not C and/or C++.

Using Gemini to help define the ISA was an interesting process. It had useful input in a "pair-design" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to "please answer then stop", and interestingly quality of the "conversation" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).

I'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document.

By HarHarVeryFunny 2026-01-0814:261 reply

Just as a self follow-up here (I hate to do it!), after chatting to Gemini some more more about document creation alternatives, it does seem that Gemini CLI is by far the best way to go, since it's working in similar fashion to Claude Code and making targeted edits (string replacements) to files, rather than regenerating from scratch (unless it has misinterpreted something you said as a request to do that, which would be obvious when it showed you the suggested diff).

Another alternative (not recommended due to potential for "drift") is to use Gemini's Canvas capability where it is working on a document rather than a specification being spread out over Chat, but this document is fully regenerated for every update (unlike Claude's artifacts), so there is potential for it to summarize or drop sections of the document ("drift") rather than just making requested changes. Canvas also doesn't have Artifact's versioning to allow you to go back to undo unwanted drifts/changes.

By mattarm 2026-01-0815:221 reply

Yeah, the online Gemini app is not good for long lived conversations that build up a body of decisions. The context window gets too large and things drop.

What I’ve learned is that once you reach that point you’ve got to break that problem down into smaller pieces that the AI can work productively with.

If you’re about to start with Gemini-cli I recommend you look up https://github.com/github/spec-kit. It’s a project out of Microsoft/Github that encodes a rigorous spec-then-implement multi pass workflow. It gets the AI to produce specs, double check the specs for holes and ambiguity, plan out implementation, translate that into small tasks, then check them off as it goes. I don’t use spec-kit all the time, but it taught me that what explicit multi pass prompting can do when the context is held in files on disk, often markdown that I can go in and change as needed. I think it ask basically comes down to enforcing enough structure in the form of codified processes, self checks and/or tests for your code.

Pro tip, tell spec-kit to do TDD in your constitution and the tests will keep it on the rails as you progress. I suspect “vibe coding” can get a bad rap due to lack of testing. With AI coding I think test coverage gets more important.

By HarHarVeryFunny 2026-01-0816:46

Thanks for the spec-kit recommendation - I'll give it a try!

By nycdatasci 2026-01-0718:251 reply

Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.

By klaussilveira 2026-01-081:19

Yes, December 2025 and January 2026.

By ryandrake 2026-01-0718:131 reply

I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen.

By Wowfunhappy 2026-01-0718:211 reply

I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code.

The Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic, and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing.

By ryandrake 2026-01-0718:261 reply

Yea, for obvious reasons, it seems to be best at code that transforms data: text/binary input to text/binary output. And where the logic can be tracked and verified at runtime with sufficient (text) logging. In other words, it's much better close loop than open loop. I tried to help it by prompting it to please take a screen capture of its output to verify functionality, but it seems LLMs aren't quite ready for that yet.

By mattarm 2026-01-0814:42

They work much better off a test that must pass. That they can “see”. Without it they are just making up some other acceptance criteria.

By antonvs 2026-01-071:37

> It also can't do Rust really well, once you get to the meat of it. Not sure why that is

Because types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally.

By nopakos 2026-01-076:42

I have not tried C++, but Codex did a good job with low-level C code, shaders as well as porting 32 bit to 64 bit assembly drawing routines. I have also tried it with retro-computing programming with relative success.

By ivm 2026-01-071:06

> Mobile

From what I've seen, CC has troubles with the latest Swift too, partially because of it being latest and partially because it's so convoluted nowadays.

But it's übercharged™ for C#

By spaceman_2020 2026-01-0618:2111 reply

I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.

And I get it. Coding with Claude Code really was prompting something, getting errors, and asking it to fix it. Which was still useful but I could see why a skilled coder adding a feature to a complex codebase would just give up

Opus 4.5 really is at a new tier however. It just...works. The errors are far fewer and often very minor - "careless" errors, not fundamental issues (like forgetting to add "use client" to a nextjs client component.

By ryandrake 2026-01-0622:138 reply

This was me. I was a huge AI coding detractor on here for a while (you can check my comment history). But, in order to stay informed and not just be that grouchy curmudgeon all the time, I kept up with the models and regularly tried them out. Opus 4.5 is so much better than anything I've tried before, I'm ready to change my mind about AI assistance.

I even gave -True Vibe Coding- a whirl. Yesterday, from a blank directory and text file list of requirements, I had Opus 4.5 build an Android TV video player that could read a directory over NFS, show a grid view of movie poster thumbnails, and play the selected video file on the TV. The result wasn't exactly full-featured Kodi, but it works in the emulator and actual device, it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything. It was pretty astounding.

Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.

By mikestorrent 2026-01-0622:372 reply

I have a few Go projects now and I speak Go as well as you speak Kotlin. I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.

For instance, I always respected types, but I'm too lazy to go spend hours working on types when I can just do ruby-style duck typing and get a long ways before the inevitable problems rear their head. Now, I can use a strongly typed language and get the advantages for "free".

By gck1 2026-01-0715:30

> I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.

Oh absolutely. I've been using Python for past 15 or so years for everything.

I've never written a single line of Rust in my life, and all my new projects are Rust now, even the quick-script-throwaway things, because it's so much better at instantly screaming at claude when it goes off track. It may take it longer to finish what I asked it to do, but requires so much less involvement from me.

I will likely never start another new project in python ever.

EDIT: Forgot to add that paired with a good linter, this is even more impressive. I told Claude to come up with the most masochistic clippy configuration possible, where even a tiny mistake is instantly punished and exceptions have to be truly exceptional (I have another agent that verifies this each run).

I just wish there was cargo-clippy for enforcing architectural patterns.

By tezza 2026-01-070:041 reply

and with types, it makes it easier for rounds of agents to pick up mistakes at compile time, statically. linting and sanity checking untyped languages only goes so far. I've not seen LLM's one shot perl style regexes. and javascript can still have ugly runtime WTFs

By nl 2026-01-072:44

I've found this too.

I find I'm doing more Typescript projects than Python because of the superior typing, despite the fact I prefer Python.

By myk9001 2026-01-070:471 reply

Oh, wow, that's impressive, thanks for sharing!

Going to one-up you though -- here's a literal one-liner that gets me a polished media center with beautiful interface and powerful skinning engine. It supports Android, BSD, Linux, macOS, iOS, tvOS and Windows.

`git clone https://github.com/xbmc/xbmc.git`

By ryandrake 2026-01-071:012 reply

Hah! I actually initiated the project because I'm a long time XBMC/Kodi user. I started using it when it was called XBMC, on an actual Xbox 1. I am sick and tired of its crashing, poor playback performance, and increasingly bloated feature set. It's embarrassing when I have friends or family over for movie night, and I have to explain "Sorry folks, Kodi froze midway through the movie again" while I frantically try to re-launch/reboot my way back to watching the movie. VLC's playback engine is much better but the VLC app's TV UX is ass. This application actually uses the libVLC playback engine under the hood.

By apitman 2026-01-074:001 reply

I think anecdotes like this may prove very relevant the next few years. AI might make bad code, but a project of bad code that's still way smaller than a bloated alternative, and has a UX tailored to your exact requirements could be compelling.

A big part of the problem with existing software is that humans seem to be pretty much incapable of deciding a project is done and stop adding to it. We treat creating code like a job or hobby instead of a tool. Nothing wrong with that, unless you're advertising it as a tool.

By ryandrake 2026-01-074:54

Yea, after this little experiment, I feel like I can just go through every big, bloated, slow, tech-debt-ridden software I use and replace it with a tiny, bespoke version that does only what I need and no more.

The old adage about how "users use 10% of your software's features, but they each use a different 10%" can now be solved by each user just building that 10% for themselves.

By indigodaddy 2026-01-073:27

Have you tried VidHub? Works nicely against almost anything. Plex, jellyfin, smb/webdav folder etc

By ku1ik 2026-01-078:112 reply

How do you know “it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything” if you built it just yesterday? Isn’t it a bit too early for claims like this? I get it’s easy to bring ideas to life but aren’t we overly optimistic?

By ryandrake 2026-01-0716:57

Part of the "one day" development time was exhaustively testing it. Since the tool's scope is so small, getting good test coverage was pretty easy. Of course, I'm not guaranteeing through formal verification methods that the code is bug free. I did find bugs, but they were all areas that were poorly specified by me in the requirements.

By missingdays 2026-01-079:46

By tomorrow the app will be replaced with a new version from the other competitor, by that time the memory leak will not reveal itself

By rdedev 2026-01-071:011 reply

I decided to vibe code something myself last week at work. I've been wanting to create a poc that involves a coding agent create custom bokeh plots that a user can interact with and ask follow up questions. All this had to be served using a holoview panel library

At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems. Claude was able to get slthe first iteration up pretty quick. At that stage the app could create a plot and you could interact with it and ask follow up questions.

Then I asked it to extend the app so that it could generate multiple plots and the user could interact with all of them one at a time. It made a bunch of changes but the feature was never implemented. I asked it to do again but got the same outcome. I completely accept the fact that it could just be all because I am using vscode copilot or my promoting skills are not good but the LLM got 70% of the way there and then completely failed

By cebert 2026-01-072:441 reply

> At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems.

You really need to at least try Claude Code directly instead of using CoPilot. My work gives us access to CoPilot, Claude Code, and Codex. CoPilot isn’t close to the other more agentic products.

By debian3 2026-01-073:501 reply

Vs code copilot extension the harness is not great, but Opus 4.5 with Copilot CLI works quite well.

By pluralmonad 2026-01-0717:39

Do they manage context differently or have different system prompts? I would assume a lot of that would be the same between them. I think GH Copilots biggest shortcoming is that they are too token cheap. Aggressively managing context to the detriment of the results. Watching Claude read a 500 line file in 100 line chunks just makes me sad.

By yieldcrv 2026-01-0622:311 reply

I recently replaced my monitor with one that could be vertically oriented, because I'm just using Claude Code in the terminal and not looking at file trees at all

but I do want a better way to glance and keep up with what its doing in longer conversations, for my own mental context window

By adastra22 2026-01-0623:431 reply

Ah, but you’re at the beginning stage young grasshopper. Soon you will be missing that horizontal ultra wide monitor as you spin up 8 different Claude agents in parallel seasons.

By yieldcrv 2026-01-070:07

oh I noticed! I've begun doing that on my laptop. I just started going down all my list of sideprojects one by one, then two by two, a Claude Code instance in a terminal window for each folder. It's a bit mental

I'm finding that branding and graphic design is the most arduous part, that I'm hoping to accelerate soon. I'm heavily AI assisted there too and I'm evaluating MCP servers to help, but so far I do actually have to focus on just that part as opposed to babysit

By libraryofbabel 2026-01-070:201 reply

Thanks for posting this. It's a nice reminder that despite all the noise from hype-mongers and skeptics in the past few years, most of us here are just trying to figure this all out with an open mind and are ready to change our opinions when the facts change. And a lot of people in the industry that I respect on HN or elsewhere have changed their minds about this stuff in the last year, having previously been quite justifiably skeptical. We're not in 2023 anymore.

If you were someone saying at the start of 2025 "this is a flash in the pan and a bunch of hype, it's not going to fundamentally change how we write code", that was still a reasonable belief to hold back then. At the start of 2026 that position is basically untenable: it's just burying your head in the sand and wishing for AI to go away. If you're someone who still holds it you really really need to download Claude Code and set it to Opus and start trying it with an open mind: I don't know what else to tell you. So now the question has shifted from whether this is going to transform our profession (it is), to how exactly it's going to play out. I personally don't think we will be replacing human engineers anytime soon ("coders", maybe!), but I'm prepared to change my mind on that too if the facts change. We'll see.

I was a fellow mind-changer, although it was back around the first half of last year when Claude Code was good enough to do things for me in a mature codebase under supervision. It clearly still had a long way to go but it was at that tipping point from "not really useful" to "useful". But Opus 4.5 is something different - I don't feel I have to keep pulling it back on track in quite the way I used to with Sonnet 3.7, 4, even Sonnet 4.5.

For the record, I still think we're in a bubble. AI companies are overvalued. But that's a separate question from whether this is going to change the software development profession.

By arcfour 2026-01-071:393 reply

The AI bubble is kind of like the dot-com bubble in that it's a revolutionary technology that will certainly be a huge part of the future, but it's still overhyped (i.e. people are investing without regard for logic).

By ryandrake 2026-01-072:08

We were enjoying cheap second hand rack mount servers, RAM, hard drives, printers, office chairs and so on for a decade after the original dot com crash. Every company that went out of business liquidated their good shit for pennies.

I'm hoping after AI comes back down to earth there will be a new glut of cheap second hand GPUs and RAM to get snapped up.

By libraryofbabel 2026-01-072:04

Right. And same for railways, which had a huge bubble early on. Over-hyped on the short time horizon. Long term, they were transformative in the end, although most of the early companies and early investors didn’t reap the eventual profits.

By nl 2026-01-072:541 reply

But the dot-com bubble wasn't overhyed in retrospect. It was under-hyped.

By arcfour 2026-01-073:191 reply

At the time it was overhyped because just by adding .com to your company's name you could increase your valuation regardless of whether or not you had anything to do with the internet. Is that not stupid?

I think my comparison is apt; being a bubble and a truly society-altering technology are not mutually exclusive, and by virtue of it being a bubble, it is overhyped.

By retsibsi 2026-01-074:33

There was definitely a lot of stupid stuff happening. IMO the clearest accurate way to put it is that it was overhyped for the short term (hence the crazy high valuations for obvious bullshit), and underhyped for the long term (in the sense that we didn't really foresee how broadly and deeply it would change the world). Of course, there's more nuance to it, because some people had wild long-term predictions too. But I think the overall, mainstream vibe was to underappreciate how big a deal it was.

By fpauser 2026-01-0622:531 reply

> Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.

... says it all.

By jononor 2026-01-088:54

What exactly does it say, in your opinion? I can imagine 4-5 different takes on that post.

By sksishbs 2026-01-071:30

[dead]

By theshrike79 2026-01-071:003 reply

> "asking it to fix it."

This is what people are still doing wrong. Tools in a loop people, tools in a loop.

The agent has to have the tools to detect whatever it just created is producing errors during linting/testing/running. When it can do that, I can loop again, fix the error and again - use the tools to see whether it worked.

I _still_ encounter people who think "AI programming" is pasting stuff into ChatGPT on the browser and they complain it hallucinates functions and produces invalid code.

Well, d'oh.

By ikornaselur 2026-01-0713:16

Last weekend I was debugging some blocking issue on a microcontroller with embassy-rs, where the whole microcontroller would lock up as soon as I started trying to connect to an MQTT server.

I was having Opus investigate it and I kept building and deploying the firmware for testing.. then I just figured I'd explain how it could do the same and pull the logs.

Off it went, for the next ~15 minutes it would flash the firmware multiple times until it figured out the issue and fixed it.

There was something so interesting about seeing a microcontroller on the desk being flashed by Claude Code, with LEDs blinking indicating failure states. There's something about it not being just code on your laptop that felt so interesting to me.

But I agree, absolutely, red/green test or have a way of validating (linting, testing, whatever it is) and explain the end-to-end loop, then the agent is able to work much faster without being blocked by you multiple times along the way.

By gck1 2026-01-0715:38

This is kind of why I'm not really scared of losing my job.

While Claude is amazing at writing code, it still requires human operators. And even experienced human operators are bad at operating this machinery.

Tell your average joe - the one who thinks they can create software without engineers - what "tools-in-a-loop" means, and they'll make the same face they made when you tried explaining iterators to them, before LLMs.

Explain to them how typing system, E2E or integration test helps the agent, and suddenly, they now have to learn all the things they would be required to learn to be able to write on their own.

By nprateem 2026-01-0713:34

Jules is slow incompetent shit and that uses tools in a loop, so no...

By ern 2026-01-070:112 reply

I have been out of the loop for a couple of months (vacation). I tried Claude Opus 4.5 at the end of November 2025 with the corporate Github Copilot subscription in Agent mode and it was awful: basically ignoring code and hallucinating.

My team is using it with Claude Code and say it works brilliantly, so I'll be giving it another go.

How much of the value comes from Opus 4.5, how much comes from Claude Code, and how much comes from the combination?

By everfrustrated 2026-01-071:163 reply

As someone coming from GitHub copilot in vscode and recently trying Claude Code plugin for vscode I don't get the fuss about Claude.

Copilot has by far the best and most intuitive agent UI. Just make sure you're in agent mode and choose Sonnet or Opus models.

I've just cancelled my Claude sub and gone back and will upgrade to the GH Pro+ to get more sonnet/opus.

By indigodaddy 2026-01-073:34

Check out Antigravity+Google AI Pro $20 plan+Opus 4.5. apparently the Opus limits are insanely generous (of course that could change on a dime).

By pluralmonad 2026-01-0718:09

I strongly concur with your second statement. Anything other than agent mode in GH copilot feels useless to me. If I want to engage Opus through GH copilot for planning work, I still use agent mode and just indicate the desired output is whatever.md. I obviously only do this in environments lacking a better tool (Claude Code).

By ern 2026-01-071:51

I'd used both CC and Copilot Agent Mode in VSCode, but not the combination of CC + Opus 4.5, and I agree, I was happy enough with Copilot.

The gap didn't seem big, but in November (which admittedly was when Opus 4.5 was in preview on Copilot) Opus 4.5 with Copilot was awful.

By Dusseldorf 2026-01-071:05

I suspect that's the other thing at play here; many people have only tried Copilot because it's cheap with all the other Microsoft subscriptions many companies have. Copilot frankly is garbage compared to Cursor/Claude, even with the same exact models.

By AstroBen 2026-01-0618:424 reply

my issue hasn't been for a long time now that the code they write works or doesn't work. My issues all stem from that it works, but does the wrong thing

By zmmmmm 2026-01-0621:441 reply

> My issues all stem from that it works, but does the wrong thing

It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests.

I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.

By giancarlostoro 2026-01-0621:571 reply

This is how Beads works, especially with Claude Code. What I do is I tell Claude to always create a Bead when I tell it to add something, or about something that needs to be added, then I start brainstorming, and even ask it to do market research what are top apps doing for x, y or z. Then ask it to update the bead (I call them tasks) and then finally when its got enough detail, I tell it, do all of these in parallel.

By beoberha 2026-01-0622:39

Beads is amazing. It’s such a simple concept but elevates agentic coding to another levels

By simonw 2026-01-0621:325 reply

If it does the wrong thing you tell it what the right thing is and have it try again.

With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.

By GoatInGrey 2026-01-0622:173 reply

There are several rubs with that operating protocol extending beyond the "you're holding it wrong" claim.

1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

By simonw 2026-01-0622:261 reply

The value I'm getting from this stuff is so large that I'll take those risks, personally.

By th0ma5 2026-01-071:452 reply

[flagged]

By dang 2026-01-101:44

Since we asked you to stop hounding another user in this manner and you've continued to do it repeatedly, I've banned the account. This is not what Hacker News is for, and you've done it almost 50 times (!), almost 30 of which have been after we first asked you to stop. That is extreme, and totally unacceptable.

https://news.ycombinator.com/item?id=46456850

https://news.ycombinator.com/item?id=44726957

https://news.ycombinator.com/item?id=44110805

(You've also been breaking the site guidelines in plenty of other places - e.g. https://news.ycombinator.com/item?id=46521516, https://news.ycombinator.com/item?id=46395646. This is not what this site is for, and destroys what it is for.)

By scubbo 2026-01-075:391 reply

Many people - simonw is the most visible of them, but there are countless others - have given up trying to convinced folks who are determined to not be convinced, and are simply enjoying their increased productivity. This is not a competition or an argument.

By llmslave2 2026-01-076:481 reply

Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

My experience scrolling X and HN is a bunch of people going "omg opus omg Claude Code I'm 10x more productive" and that's it. Just hand wavy anecdotes based on their own perceived productivity. I'm open to being convinced but just saying stuff is not convincing. It's the opposite, it feels like people have been put under a spell.

I'm following The Primeagen, he's doing a series where he is trying these tools on stream and following peoples advice on how to use them the best. He's actually quite a good programmer so I'm eager to see how it goes. So far he isn't impressed and thus neither am I. If he cracks it and unlocks significant productivity then I will be convinced.

By enraged_camel 2026-01-077:391 reply

>> Maybe they are struggling to convince others because they are unable to produce evidence that is able to convince people?

Simon has produced plenty of evidence over the past year. You can check their submission history and their blog: https://simonwillison.net/

The problem with people asking for evidence is that there's no level of evidence that will convince them. They will say things like "that's great but this is not a novel problem so obviously the AI did well" or "the AI worked only because this is a greenfield project, it fails miserably in large codebases".

By llmslave2 2026-01-078:101 reply

It's true that some people will just continually move the goalposts because they are invested in their beliefs. But that doesn't mean that the skepticism around certain claims aren't relevant.

Nobody serious is disputing that LLM's can generate working code. They dispute claims like "Agentic workflows will replace software developers in the short to medium term", or "Agentic workflows lead to 2-100x improvements in productivity across the board". This is what people are looking for in terms of evidence and there just isn't any.

Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity [0]. We also have evidence that it harms our cognitive abilities [1]. Anecdotally, I have found myself lazily reaching for LLM assistance when encountering a difficult problem instead of thinking deeply about the problem. Anecdotally I also struggle to be more productive using AI-centric agents workflows in areas of expertise.

We want evidence that "vibe engineering" is actually more productive across the entire lifespan of a software project. We want evidence that it produces better outcomes. Nobody has yet shown that. It's just people claiming that because they vibe coded some trivial project, all of software development can benefit from this approach. Recently a principle engineer at Google claimed that Claude Code wrote their team's entire year's worth of work in a single afternoon. They later walked that claim back, but most do not.

I'm more than happy to be convinced but it's becoming extremely tiring to hear the same claims being parroted without evidence and then you get called a luddite when you question it. It's also tiring when you push them on it and they blame it on the model you use, and then the agent, and then the way you handle context, and then the prompts, and then "skill issue". Meanwhile all they have to show is some slop that could be hand coded in a couple hours by someone familiar with the domain. I use AI, I was pretty bullish on it for the last two years, and the combination of it simply not living up to expectations + the constant barrage of what feels like a stealth marketing campaign parroting the same thing over and over (the new model is way better, unlike the other times we said that) + the amount of absolute slop code that seems to continue to increase + companies like Microsoft producing worse and worse software as they shoehorn AI into every single product (Office was renamed to Copilot 365). I've become very sensitive to it, much in the same way I was very sensitive to the claims being made by certain VC backed webdev companies regarding their product + framework in the last few years.

I'm not even going to bring up the economic, social, and environmental issues because I don't think they're relevant, but they do contribute to my annoyance with this stuff.

[0] https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... [1] https://news.harvard.edu/gazette/story/2025/11/is-ai-dulling...

By lunar_mycroft 2026-01-078:241 reply

> Thus far, we do have evidence that AI (at least in OSS) produces a 19% decrease in productivity

I generally agree with you, but I'd be remiss if I didn't point out that it's plausible that the slow down observed in the METR study was at least partially due to the subjects lack of experience with LLMs. Someone with more experience performed the same experiment on themselves, and couldn't find a significant difference between using LLMs and not [0]. I think the more important point here is that programmers subjective assessment of how much LLMs help them is not reliable, and biased towards the LLMs.

[0] https://mikelovesrobots.substack.com/p/wheres-the-shovelware...

By llmslave2 2026-01-078:451 reply

I think we're on the same page re. that study. Actually your link made me think about the ongoing debate around IDE's vs stuff like Vim. Some people swear by IDE's and insist they drastically improve their productivity, others dismiss them or even claim they make them less productive. Sound familiar? I think it's possible these AI tools are simply another way to type code, and the differences averaged out end up being a wash.

By AstroBen 2026-01-0715:361 reply

IDEs vs vim makes a lot of sense. AI really does feel like using an IDE in a certain way

Using AI for me absolutely makes it feel like I'm more productive. When I look back on my work at the end of the day and look at what I got done, it would be ludicrous to say it was multiple times the amount as my output pre-AI

Despite all the people replying to me saying "you're holding it wrong" I know the fix to it doing the wrong thing. Specify in more detail what I want. The problem with that is twofold:

1. How much to specify? As little as possible is the ideal, if we want to maximize how much it can help us. A balance here is key. If I need to detail every minute thing I may as well write the code myself

2. If I get this step wrong, I still have to review everything, rethink it, go back and re-prompt, costing time

When I'm working on production code, I have to understand it all to confidently commit. It costs time for me to go over everything, sometimes multiple iterations. Sometimes the AI uses things I don't know about and I need to dig into it to understand it

AI is currently writing 90% of my code. Quality is fine. It's fun! It's magical when it nails something one-shot. I'm just not confident it's faster overall

By llmslave2 2026-01-0720:56

I think this is an extremely honest perspective. It's actually kind of cool that it's gotten to the point it can write most code - albeit with a lot of handholding.

By theshrike79 2026-01-071:052 reply

I've said this multiple times:

This is why you use this AI bubble (it IS a bubble) to use the VC-funded AI models for dirt cheap prices and CREATE tools for yourself.

Need a very specific linter? AI can do it. Need a complex Roslyn analyser? AI. Any kind of scripting or automation that you run on your own machine. AI.

None of that will go away or suddenly stop working when the bubble bursts.

Within just the last 6 months I've built so many little utilities to speed up my work (and personal life) it's completely bonkers. Most went from "hmm, might be cool to..." to a good-enough script/program in an evening while doing chores.

Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

By MarsIronPI 2026-01-0716:51

> Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.

Are there any local models that are at least somewhat comparable to the latest-and-greatest (e.g. Opus 4.5, Gemini 3), especially in terms of coding?

By lunar_mycroft 2026-01-074:173 reply

A risk I see with this approach is that when the bubble pops, you'll be left dependent on a bunch of tools which you don't know how to maintain or replace on your own, and won't have/be able to afford access to LLMs to do it for you.

By theshrike79 2026-01-079:38

The "tools" in this context are literally a few hundred lines of Python or Github CI build pipeline, we're not talking about 500kLOC massive applications.

I'm building tools, not complete factories :) The AI builds me a better hammer specifically for the nails I'm nailing 90% of the time. Even if the AI goes away, I still know how the custom hammer works.

By AstroBen 2026-01-075:37

I thought that initially, but I don't think the skills AI weakens in me are particularly valuable

Let's say AI becomes too expensive - I more or less only have to sharpen up being able to write the language. My active recall of the syntax, common methods and libraries. That's not hard or much of a setback

Maybe this would be a problem if you're purely vibe coding, but I haven't seen that work long term

By baq 2026-01-076:59

Open source models hosted by independent providers (or even yourself, which if the bubble pops will be affordable if you manage to pick up hardware on fire sales) are already good enough to explain most code.

By kaydub 2026-01-0717:59

> 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success.

I can run multiple agents at once, across multiple code bases (or the same codebase but multiple different branches), doing the same or different things. You absolutely can't keep up with that. Maybe the one singular task you were working on, sure, but the fact that I can work on multiple different things without the same cognitive load will blow you out of the water.

> 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1.

Tell the LLM to document in comments why it did things. Human developers often leave and then people with no knowledge of their codebase or their "whys" are even around to give details. Devs are notoriously terrible about documentation.

> 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves.

You can't develop at the same velocity, so drop that assumption now. There's all kinds of lower abstractions that you build on top of that you probably can't explain currently.

> 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.

You aren't keeping up with the actual economics. This shit is technically profitable, the unprofitable part is the ongoing battle between LLM providers to have the best model. They know software in the past has often been winner takes all so they're all trying to win.

By Capricorn2481 2026-01-070:311 reply

> With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try

That's great that this is your experience, but it's not a lot of people's. There are projects where it's just not going to know what to do.

I'm working in a web framework that is a Frankenstein-ing of Laravel and October CMS. It's so easy for the agent to get confused because, even when I tell it this is a different framework, it sees things that look like Laravel or October CMS and suggests solutions that are only for those frameworks. So there's constant made up methods and getting stuck in loops.

The documentation is terrible, you just have to read the code. Which, despite what people say, Cursor is terrible at, because embeddings are not a real way to read a codebase.

By simonw 2026-01-077:181 reply

I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast.

One trick I use that might work for you as well:

  Clone GitHub.com/simonw/datasette to /tmp
  then look at /tmp/docs/datasette for
  documentation and search the code
  if you need to

Try that with your own custom framework and it might unblock things.

If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

By Capricorn2481 2026-01-0721:491 reply

> I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast

Potentially because there is no baggage with similar frameworks. I'm sure it would have an easier time with this if it was not spun off from other frameworks.

> If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!

If Claude cannot read the code well enough to begin with, and needs supplemental documentation, I certainly don't want it generating the docs from the code. That's just compounding hallucinations on top of each other.

By simonw 2026-01-087:48

Give it a try and see get happens.

I find Claude Code is so good at docs that I sometimes investigate a new library by checking out a GitHub repo, deleting the docs/ and README and having Claude write fresh docs from scratch.

By aurumque 2026-01-0621:591 reply

In a circuitous way, you can rather successfully have one agent write a specification and another one execute the code changes. Claude code has a planning mode that lets you work with the model to create a robust specification that can then be executed, asking the sort of leading questions for which it already seems to know it could make an incorrect assumption. I say 'agent' but I'm really just talking about separate model contexts, nothing fancy.

By mikestorrent 2026-01-0622:401 reply

Cursor's planning functionality is very similar and I have found that I can even use "cheap" models like their Composer-1 and get great results in the planning phase, and then turn on Sonnet or Opus to actually produce the plan. 90% of the stuff I need to argue about is during the planning phase, so I save a ton of tokens and rework just making a really good spec.

It turns out that Waterfall was always the correct method, it's just really slow ;)

By aurumque 2026-01-0717:37

Did you know that software specifications used to be almost entirely flow charts? There is something to be said for that and waterfall.

By cadamsdotcom 2026-01-0621:44

Even better, have it write code to describe the right thing then run its code against that, taking yourself out of that loop.

By giancarlostoro 2026-01-0621:581 reply

And if you've told it too many times to fix it, tell it someone has a gun to your head, for some reason it almost always gets it right this very next time.

By dare944 2026-01-074:331 reply

If you're a developer at the dawn of the AI revolution, there is absolutely a gun to your head.

By giancarlostoro 2026-01-0714:37

Yeah, if anyone can truly afford the AI empire. Remember all these "leading" companies are running it at a loss, so most companies paying for it are severely underpaying the cost of it all. We would need an insane technological breakthrough of unlimited memory and power before I start to worry, and at that point, I'll just look for a new career.

By jmathai 2026-01-0621:29

I think it's worth understanding why. Because that's not everyone's experience and there's a chance you could make a change such that you find it extremely useful.

There's a lesser chance that you're working on a code base that Claude Code just isn't capable of helping with.

By solumunus 2026-01-0621:441 reply

Correct it then, and next time craft a more explicit plan.

By wubrr 2026-01-0621:593 reply

The more explicit/detailed your plan, the more context it uses up, the less accurate and generally functional it is. Don't get me wrong, it's amazing, but on a complex problem with large enough context it will consistently shit the bed.

By rectang 2026-01-071:06

The human still has to manage complexity. A properly modularized and maintainable code base is much easier for the LLM to operate on — but the LLM has difficulty keeping the code base in that state without strong guidance.

Putting “Make minimal changes” in my standard prompt helped a lot with the tendency of basically all agents to make too many changes at once. With that addition it became possible to direct the LLM to make something similar to the logical progression of commits I would have made anyway, but now don’t have to work as hard at crafting.

Most of the hype merchants avoid the topic of maintainability because they’re playing to non-technical management skeptical of the importance of engineering fundamentals. But everything I’ve experienced so far working with LLMs screams that the fundamentals are more important than ever.

By solumunus 2026-01-0622:51

It usually works well for me. With very big tasks I break the plan into multiple MD files with the relevant context included and work through in individual sessions, updating remaining plans appropriately at the end of each one (usually there will be decision changes or additions during iteration).

By pigpop 2026-01-073:38

It takes a lot of plan to use up the context and most of the time the agent doesn't need the whole plan, they just need what's relevant to the current task.

By scubbo 2026-01-070:56

This was me. I have done a full 180 over the last 12 months or so, from "they're an interesting idea, and technically impressive, but not practically useful" to "holy shit I can have entire days/weeks where I don't write a single line of code".

By littlestymaar 2026-01-0622:312 reply

> I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.

It's not just the deficiencies of earlier versions, but the mismatch between the praise from AI enthusiasts and the reality.

I mean maybe it is really different now and I should definitely try uploading all of my employer's IP on Claude's cloud and see how well it works. But so many people were as hyped by GPT-4 as they are now, despite GPT-4 actually being underwhelming.

Too much hype for disappointing results leads to skepticism later on, even when the product has improved.

By roadside_picnic 2026-01-0622:582 reply

I feel similar, I'm not against the idea that maybe LLMs have gotten so much better... but I've been told this probably 10 times in the last few years working with AI daily.

The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant. Otherwise, wait and see what sticks. If this summer people are still citing the Opus 4.5 was a game changing moment and have solid, repeatable workflows, then I'll happily change up my workflow.

Someone could walk into the LLM space today and wouldn't be significantly at a loss for not having paid attention to anything that had happened in the last 4 years other than learning what has stuck since then.

By baq 2026-01-077:01

If the trend line holds you’ll be very, very surprised.

By kaydub 2026-01-0718:062 reply

> The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant.

LMAO what???

By roadside_picnic 2026-01-0719:221 reply

I've lived through multiple incredibly rapid changes in tech throughout my career, and the lesson always learned was there is a lot of wasted energy keeping up.

Two big examples:

- Period from early mvc JavaScript frontends (backbone.js etc) and the time of the great React/Angular wars. I completely stepped out of the webdev space during that time period.

- The rapid expansion of Deep Learning frameworks where I did try to keep up (shipped some Lua torch packages and made minor contributions to Pylearn2).

In the first case, missing 5 years of front-end wars had zero impact. After not doing webdev work at all for 5-years I was tasked with shipping a React app. It took me a week to catch up, and everything was deployed in roughly the same time as someone would have had they spent years keeping up with changes.

In the second case, where I did keep up with many of the developing deep learning frameworks, it didn't really confer any advantage. Coworkers who I worked with who started with Pytorch fresh out of school were just as proficient, if not more so, with building models. Spending energy keeping up offered no value other than feeling "current" at the time.

Can you give me a counter example of where keeping up with a rapidly changing area that's unstable has conferred a benefit to you? Most of FOMO is really just fear. Again, unless you're trying to sell your self specifically as a consultant on the bleeding edge, there's no reason to keep up with all these changes (other than finding it fun).

By kaydub 2026-01-0721:17

You moved out of webdev for 5 years, not everybody else had that luxury. I'm sure it was beneficial to those people to keep up with webdev technologies.

By recursive 2026-01-0721:23

If everything changes every month, then stuff you learn next month would be obsolete in two months. This is a response to people saying "adapt or be left behind". There's so much thrashing that if you're not interested with the SOTA, you can just wait for everything to calm down and pick it up then.

By spaceman_2020 2026-01-074:412 reply

You enter some text and a computer spits out complex answers generated on the spot

Right or wrong - doesn’t matter. You typed in a line of text and now your computer is making 3000 word stories, images, even videos based on it

How are you NOT astounded by that? We used to have NONE of this even 4 years ago!

By littlestymaar 2026-01-078:291 reply

Of course I'm astounded. But being spectacular and being useful are entirely different things.

By spaceman_2020 2026-01-0711:121 reply

If you've found nothing useful about AI so far then the problem is likely you

By recursive 2026-01-0721:25

I don't think it's necessarily a problem. And even if you accept that the problem is you, it doesn't exactly provide a "solution".

By nprateem 2026-01-075:311 reply

Because I want correct answers.

By Kim_Bruning 2026-01-0713:44

> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

-- Charles Babbage

By troupo 2026-01-0622:275 reply

> Opus 4.5 really is at a new tier however. It just...works.

Literally tried it yesterday. I didn't see a single difference with whatever model Claude Code was using two months ago. Same crippled context window. Same "I'll read 10 irrelevant lines from a file", same random changes etc.

By EMM_386 2026-01-070:071 reply

The context window isn't "crippled".

Create a markdown document of your task (or use CLAUDE.md), put it in "plan mode" which allows Claude to use tool calls to ask questions before it generates the plan.

When it finishes one part of the plan, have it create a another markdown document - "progress.md" or whatever with the whole plan and what is completed at that point.

Type /clear (no more context window), tell Claude to read the two documents.

Repeat until even a massive project is complete - with those 2 markdown documents and no context window issues.

By troupo 2026-01-078:351 reply

> The context window isn't "crippled".

... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.

By EMM_386 2026-01-0713:371 reply

> ... Proceeds to explain how it's crippled and all the workarounds you have to do to make it less crippled.

No - that's not what I did.

You don't need an extra-long context full of irrelevant tokens. Claude doesn't need to see the code it implemented 40 steps ago in a working method from Phase 1 if it is on Phase 3 and not using that method. It doesn't need reasoning traces for things it already "thought" through.

This other information is cluttering, not helpful. It is making signal to noise ratio worse.

If Claude needs to know something it did in Phase 1 for Phase 4 it will put a note on it in the living markdown document to simply find it again when it needs it.

By troupo 2026-01-0714:101 reply

Again, you're basically explaining how Claude has a very short limited context and you have to implement multiple workarounds to "prevent cluttering". Aka: try to keep context as small as possible, restart context often, try and feed it only small relevant information.

What I very succinctly called "crippled context" despite claims that Opus 4.5 is somehow "next tier". It's all the same techniques we've been using for over a year now.

By scotty79 2026-01-0714:401 reply

Context is a short term memory. Yours is even more limited and yet somehow you get by.

By troupo 2026-01-0717:081 reply

I get by because I also have long-term memory, and experience, and I can learn. LLMs have none of that, and every new session is rebuilding the world anew.

And even my short-term memory is significantly larger than the at most 50% of the 200k-token context window that Claude has. It runs out of context before my short-term memory is probably not even 1% full, for the same task (and I'm capable of more context-switching in the meantime).

And so even the "Opus 4.5 really is at a new tier" runs into the very same limitations all models have been running into since the beginning.

By scotty79 2026-01-0717:251 reply

> LLMs have none of that, and every new session is rebuilding the world anew.

For LLMs long term memory is achieved by tooling. Which you discounted in your previous comments.

You also overstimate capacity of your short-term memory by few orders of magnitude:

https://my.clevelandclinic.org/health/articles/short-term-me...

By troupo 2026-01-0717:371 reply

> For LLMs long term memory is achieved by tooling. Which you discounted in your previous comments.

My specific complaint, which is an observable fact about "Opus 4.5 is next tier": it has the same crippled context that degrades the quality of the model as soon as it fills 50%.

EMM_386: no-no-no, it's not crippled. All you have to do is keep track across multiple files, clear out context often, feed very specific information not to overflow context.

Me: so... it's crippled, and you need multiple workarounds

scotty79: After all it's the same as your own short-term memory, and <some unspecified tooling (I guess those same files)> provide long-term memory for LLMs.

Me: Your comparison is invalid because I can go have lunch, and come back to the problem at hand and continue where I left off. "Next tier Opus 4.5" will have to be fed the entire world from scratch after a context clear/compact/in a new session.

Unless, of course, you meant to say that "next tier Opus model" only has 15-30 second short term memory, and needs to keep multiple notes around like the guy from Memento. Which... makes it crippled.

By scotty79 2026-01-0721:182 reply

If you refuse to use what you call workarounds and I call long term memory then you end up with a guy from Memento and regardless of how smart the model is it can end up making same mistakes. And that's why you can't tell the difference between smarter and dumber one while others can.

By recursive 2026-01-0721:27

I think the premise is that if it was the "next tier" than you wouldn't need to use these workarounds.

By troupo 2026-01-0721:571 reply

> If you refuse to use what you call workarounds

Who said I refuse them?

I evaluated the claim that Opus is somehow next tier/something different/amazeballs future at its face value. It still has all the same issues and needs all the same workarounds as whatever I was using two months ago (I had a bit of a coding hiatus between beginning of December and now).

> then you end up with a guy from Memento and regardless of how smart the model is

Those models are, and keep being the guy from memento. Your "long memory" is nothing but notes scribbled everywhere that you have to re-assemble every time.

> And that's why you can't tell the difference between smarter and dumber one while others can.

If it was "next tier smarter" it wouldn't need the exact same workarounds as the "dumber" models. You wouldn't compare the context to the 15-30 second short-term memory and need unspecified tools [1] to have "long-term memory". You wouldn't have the model behave in an indistinguishable way from a "dumber" model after half of its context windows has been filled. You wouldn't even think about context windows. And yet here we are

[1] For each person these tools will be a different collection of magic incantations. From scattered .md files to slop like Beads to MCP servers providing access to various external storage solutions to custom shell scripts to ...

BTW, I still find "superpowers" from https://github.com/obra/superpowers to be the single best improvement to Claude (and other providers) even if it's just another in a long serious of magic chants I've evaluated.

By scotty79 2026-01-0811:071 reply

> Those models are, and keep being the guy from memento. Your "long memory" is nothing but notes scribbled everywhere that you have to re-assemble every time.

That's exactly how the long term memory works in humans as well. The fact that some of these scribbles are done chemically in the same organ that does the processing doesn't make it much better. Human memories are reassembled at recall (often inaccurately). And humans also scribble when they try to solve a problem that exceeds their short term memory.

> If it was "next tier smarter" it wouldn't need the exact same workarounds as the "dumber" models.

This is akin to opposing calling processor next tier because it still needs RAM and bus to communicate with it and SSD as well. You think it should have everything in cache to be worthy of calling it next tier.

It's fine to have your own standards for applying words. But expect further confusion and miscommunication with other people if don't intend to realign.

By troupo 2026-01-0813:131 reply

> That's exactly how the long term memory works in humans as well.

Where this is applicable when is you go away from a problem for a while. And yet I don't lose the entire context and have to rebuild it from scratch when I go for lunch, for example.

Models have to rebuild the entire world from scratch for every small task.

> This is akin to opposing calling processor next tier because it still needs RAM and bus to communicate with it and SSD as well.

You're so lost in your own metaphor that it makes no sense.

> You think it should have everything in cache to be worthy of calling it next tier.

No. "Next tier" implies something significantly and observably better. I don't. And here you are trying to tell me "if you use all the exact same tools that you have already used before with 'previous tier models' you will see it is somehow next tier".

If your "next tier" needs an equator-length list of caveats and all the same tools, it's not next tier is it?

BTW. I'm literally coding with this "next tier" tool with "long memory just like people". After just doing the "plan/execute/write notes" bullshit incantations I had to correct it:

    You're right, I fucked up on all three counts:

    1. FileDetails - I should have WIRED IT UP, not deleted it. 
       It's a useful feature to preview file details before playing.
       I treated "unused" as "unwanted" instead of "not yet connected".
  
    2. Worktree not merged - Complete oversight. Did all the work but
       didn't finish the job.
  
    3. _spacing - Lazy fix. Should have analyzed why it exists and either
      used it or removed the layout constraint entirely.

So next tier. So long memory. So person-like.

Oh. Within about 10 seconds after that it started compacting the "non-crippled" context window and immediately forgot most of what it had just been doing. So I had to clear out the context and teach it the world from the start again.

Edit. And now this amazing next tier model completely ignored that there already exists code to discover network interfaces, and wrote bullshit code calling CLI tools from Rust. So once again it needed to be reminded of this.

> It's fine to have your own standards for applying words. But expect further confusion and miscommunication with other people if don't intend to realign.

I mean, just like crypto bros before them, AI bros do sure love to invent their own terminology and their own realities that have nothing to do with anything real and observable.

By scotty79 2026-01-0817:06

> "You're right, I fucked up on all three counts:"

It very well might be that AI tools are not for you, if you are getting such poor results with your methods of approaching them.

If you would like to improve your outcomes at some point, ask people who achieve better results for pointers and try them out. Here's a freebie, never tell AI it fucked up.

By mikestorrent 2026-01-0622:421 reply

200k+ tokens is a pretty big context window if you are feeding it the right context. Editors like Cursor are really good at indexing and curating context for you; perhaps it'd be worth trying something that does that better than Claude CLI does?

By troupo 2026-01-0622:503 reply

> a pretty big context window if you are feeding it the right context.

Yup. There's some magical "right context" that will fix all the problems. What is that right context? No idea, I guess I need to read a yet-another 20 000-word post describing magical incantations that you should or shouldn't do in the context.

The "Opus 4.5 is something else/nex tier/just works" claims in my mind means that I wouldn't need to babysit its every decision, or that it would actually read relevant lines from relevant files etc. Nope. Exact same behaviors as whatever the previous model was.

Oh, and that "200k tokens context window"? It's a lie. The quality quickly degrades as soon as Claude reaches somewhere around 50% of the context window. At 80+% it's nearly indistinguishable from a model from two years ago. (BTW, same for Codex/GPT with it's "1 million token window")

By theshrike79 2026-01-071:091 reply

It's like working with humans:

  1) define problem
  2) split problem into small independently verifiable tasks
  3) implement tasks one by one, verify with tools

With humans 1) is the spec, 2) is the Jira or whatever tasks

With an LLM usually 1) is just a markdown file, 2) is a markdown checklist, Github issues (which Claude can use with the `gh` cli) and every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2

I haven't ran into context issues in a LONG time, and if I have it's usually been either intentional (it's a problem where compacting wont' hurt) or an error on my part.

By troupo 2026-01-079:001 reply

> every loop of 3 gets a fresh context, maybe the spec from step 1 and the relevant task information from 2

> I haven't ran into context issues in a LONG time

Because you've become the reverse centaur :) "a person who is serving as a squishy meat appendage for an uncaring machine." [1]

You are very aware of the exact issues I'm talking about, and have trained yourself to do all the mechanical dance moves to avoid them.

I do the same dances, that's why I'm pointing out that they are still necessary despite the claims of how model X/Y/Z are "next tier".

[1] https://doctorow.medium.com/https-pluralistic-net-2025-12-05...

By theshrike79 2026-01-079:451 reply

Yes and no. I've worked quite a bit with juniors, offshore consultants and just in companies where processes are a bit shit.

The exact same method that worked for those happened to also work for LLMs, I didn't have to learn anything new or change much in my workflow.

"Fix bug in FoobarComponent" is enough of a bug ticket for the 100x developer in your team with experience with that specific product, but bad for AI, juniors and offshored teams.

Thus, giving enough context in each ticket to tell whoever is working on it where to look and a few ideas what might be the root cause and how to fix it is kinda second nature to me.

Also my own brain is mostly neurospicy mush, so _I_ need to write the context to the tickets even if I'm the one on it a few weeks from now. Because now-me remembers things, two-weeks-from-now me most likely doesn't.

By troupo 2026-01-0710:42

The problem with LLMs (similar to people :) ) is that you never really know what works. I've had Claude one-shot "implement <some complex requirement>" with little additional input, and then completely botch even the smallest bug fix with explicit instructions and context. And vice versa :)

By CuriouslyC 2026-01-0623:222 reply

I realize your experience has been frustrating. I hope you see that every generation of model and harness is converting more hold-outs. We're still a few years from hard diminishing returns assuming capital keeps flowing (and that's without any major new architectures which are likely) so you should be able to see how this is going to play out.

It's in your interest to deal with your frustration and figure out how you can leverage the new tools to stay relevant (to the degree that you want to).

Regarding the context window, Claude needs thinking turned up for long context accuracy, it's quite forgetful without thinking.

By troupo 2026-01-078:391 reply

Note how nothing in your comment addresses anything I said. Except the last sentence that basically confirms what I said. This perfectly illustrates the discourse around AI.

As for the snide and patronizing "it's in your interest to stay relevant":

1. I use these tools daily. That's why I don't subscribe to willful wide-eyed gullibility. I know exactly what these tools can and cannot do.

The vast majority of "AI skeptics" are the same.

2. In a few years when the world is awash in barely working incomprehensible AI slop my skills will be in great demand. Not because I'm an amazing developer (I'm not), but because I have experience separating wheat from the chaff

By CuriouslyC 2026-01-0712:391 reply

The snide and patronizing is your projection. It kinda makes me sad when the discourse is so poisoned that I can't even encourage someone to protect their own future from something that's obviously coming (technical merits aside, purely based on social dynamics).

It seems the subject of AI is emotionally charged for you, so I expect friendly/rational discourse is going to be a challenge. I'd say something nice but since you're primed to see me being patronizing... Fuck you? That what you were expecting?

By troupo 2026-01-0713:122 reply

> The snide and patronizing is your projection.

It's not me who decided to barge in, assume their opponent doesn't use something or doesn't want to use something, and offer unsolicited advice.

> It kinda makes me sad when the discourse is so poisoned that I can't even encourage someone to protect their own future from something that's obviously coming

See. Again. You're so in love with your "wisdom" that you can't even see what you sound like: snide, patronising, condenscending. And completely missing the whole point of what was written. You are literally the person who poisons the discourse.

Me: "here are the issues I still experience with what people claim are 'next tier frontier model'"

You: "it's in your interests to figure out how to leverage new tools to stay relevant in the future"

Me: ... what the hell are you talking about? I'm using these tools daily. Do you have anything constructive to add to the discourse?

> so I expect friendly/rational discourse is going to be a challenge.

It's only challenge to you because you keep being in love with your voice and your voice only. Do you have anything to contribute to the actual rational discourse, are you going to attack my character?

> 'd say something nice but since you're primed to see me being patronizing... Fuck you? T

Ah. The famous friendly/rational discourse of "they attack my use of AI" (no one attacked you), "why don't you invest in learning tools to stay relevant in the future" (I literally use these tools daily, do you have anything useful to say?) and "fuck you" (well, same to you).

> That what you were expecting?

What I was expecting is responses to what I wrote, not you riding in on a high horse.

By CuriouslyC 2026-01-0714:191 reply

You were the one complaining about how the tools aren't giving you the results you expected. If you're using these tools daily and having a hard time, either you're working on something very different from the bulk of people using the tools and your problems or legitimate, or you aren't and it's a skill issue.

If you want to take politeness as being patronizing, I'm happy to stop bothering. My guess is you're not a special snowflake, and you need to "get good" or you're going to end up on unemployment complaining about how unfair life is. I'd have sympathy but you don't seem like a pleasant human being to interact with, so have fun!

By troupo 2026-01-0717:081 reply

> ou were the one complaining about how the tools aren't giving you the results you expected.

They are not giving me the results people claim they give. It is distinctly different from not giving the results I want.

> If you're using these tools daily and having a hard time, either you're working on something very different from the bulk of people using the tools and your problems or legitimate, or you aren't and it's a skill issue.

Indeed. And your rational/friendly discourse that you claim you're having would start with trying to figure that out. Did you? No, you didn't. You immediately assumed your opponent is a clueless idiot who is somehow against AI and is incapable or learning or something.

> If you want to take politeness as being patronizing, I'm happy to stop bothering.

No. It's not politeness. It's smugness. You literally started your interaction in this thread with a "git gud or else" and even managed to complain later that "you dislike it when they attack your use of AI as a skill issue". While continuously attacking others.

> you don't seem like a pleasant human being to interact with

Says the person who has contributed nothing to the conversation except his arrogance, smugness, holier-than-thou attitude, engaged in nothing but personal attacks, complained about non-existent grievances and when called out on this behavior completed his "friendly and rational discourse" with a "fuck you".

Well, fuck you, too.

Adieu.

By cindyllm 2026-01-0720:05

[dead]

By cindyllm 2026-01-0713:19

[dead]

By th0ma5 2026-01-071:483 reply

[flagged]

By CuriouslyC 2026-01-072:301 reply

Personally I'm sympathetic to people who don't want to have to use AI, but I dislike it when they attack my use of AI as a skill issue. I'm quite certain the workplace is going to punish people who don't leverage AI though, and I'm trying to be helpful.

By troupo 2026-01-078:411 reply

> but I dislike it when they attack my use of AI as a skill issue.

No one attacked your use of AI. I explained my own experience with the "Claude Opus 4.5 is next tier". You barged in, ignored anything I said, and attacked my skills.

> the workplace is going to punish people who don't leverage AI though, and I'm trying to be helpful.

So what exactly is helpful in your comments?

By CuriouslyC 2026-01-0712:33

The only thing I disagreed with in your post is your objectively incorrect statement regarding Claude's context behavior. Other than that I'm just trying to encourage you to make preparations for something that I don't think you're taking seriously enough yet. No need to get all worked up, it'll only reflect on you.

By mikestorrent 2026-01-0919:35

And, conversely, when we read a comment like yours, it sounds like someone who's afraid of computers, would maybe have decried the bicycle and automobile, and really wishes they could just go live in a cabin in the woods.

(And it's fine to do so, just don't mail bombs to us, ok?)

By pigeons 2026-01-082:39

It certainly sounds unkind, if not cultish.

By mikestorrent 2026-01-0919:39

> There's some magical "right context" that will fix all the problems.

All I can tell you is that in my own lived experience, I've had some fantastic results from AI, and it comes from telling it "look at this thing here, ok, i want you to chain it to that, please consider this factor, don't forget that... blah blah blah" like how I would have spelled things out to a junior developer, and then it really does stand a really solid chance of turning out what I've asked for. It helps a lot that I know what to ask for; there's no replacing that with AI yet.

So, your own situation must fall into one of these coarse buckets:

- You're doing something way too hard for AI to have a chance at yet, like real science / engineering at the frontier, not just boring software or infra development

- Your prompts aren't specific enough, you're not feeding it context, and you're expecting it to one-shot things perfectly instead of having to spend an afternoon prompting and correcting stuff

- You're not actually using and getting better at the tools, so you're just shouting criticisms from the sidelines, perhaps as sour grape because you're not allowed by policy / company can't afford to have you get into it.

IDK. I hope it's the first one and you're just doing Really Hard Things, but if you're doing normal software developer stuff and not seeing a productivity advantage, it's a fucking skill issue.

By pluralmonad 2026-01-0718:571 reply

I'm not familiar with any form of intelligence that does not suffer from a bloated context. If you want to try and improve your workflow, a good place to start is using sub-agents so individual task implementations do not fill up your top level agents context. I used to regularly have to compact and clear, but since using sub-agents for most direct tasks, I hardly do anymore.

By troupo 2026-01-0720:52

1. It's a workaround for context limitations

2. It's the same workarounds we've been doing forever

3. It's indistinguishable from "clear context and re-feed the entire world of relevant info from scratch" we've had forever, just slightly more automated

That's why I don't understand all the "it's new tier" etc. It's all the same issues with all the same workarounds.

By iwontberude 2026-01-0622:44

I use Sonnet and Opus all the time and the differences are almost negligible

By llmslave2 2026-01-076:522 reply

That's because Opus has been out for almost 5 months now lol. Its the same model, so I think people have been vibe coding with a heavy dose of wine this holiday and are now convinced its the future.

By Leynos 2026-01-0710:29

Opus 4.5 was released 24th November.

By spaceman_2020 2026-01-0711:131 reply

Looks like you hallucinated the Opus release date

Are you sure you're not an LLM?

By llmslave2 2026-01-0720:53

Opus 4.1 was released in August or smth.

By iwontberude 2026-01-0622:43

Opus 4.5 is fucking up just like Sonnet really. I don't know how your use is that much different than mine.

By biammer 2026-01-0619:307 reply

[flagged]

By keeda 2026-01-0621:181 reply

Actually, I've been saying that even models from 2+ years ago were extremely good, but you needed to "hold them right" to get good results, else you might cut yourself on the sharp edges of the "jagged frontier" (https://www.hbs.edu/faculty/Pages/item.aspx?num=64700) Unfortunately, this often necessitated you to adapt yourself to the tool, which is a big change -- unfeasible for most people and companies.

I would say the underlying principle was ensuring a tight, highly relevant context (e.g. choose the "right" task size and load only the relevant files or even code snippets, not the whole codebase; more manual work upfront, but almost guaranteed one-shot results.)

With newer models the sharper edges have largely disappeared, so you can hold them pretty much any which way and still get very good results. I'm not sure how much of this is from the improvements in the model itself vs the additional context it gets from the agentic scaffolding.

I still maintain that we need to adapt ourselves to this new paradigm to fully leverage AI-assisted coding, and the future of coding will be pretty strange compared to what we're used to. As an example, see Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...

By CuriouslyC 2026-01-0623:301 reply

FWIW, Gas Town is strange because Steve is strange (in a good way).

It's just the same agent swarm orchestration that most agent frameworks are using, but with quirky marketing. All of that is just based on the SDLC [PM/Architect -> engineer planning group -> engineer -> review -> qa/evaluation] loop most people here should be familiar with. So actually pretty banal, which is probably part of the reason Steve decided to be zany.

By keeda 2026-01-075:07

Ah, gotcha, I am still working through the article, but its detailed focus on all the moving parts under the covers is making it hard to grok the high-level workflow.

By QuantumGood 2026-01-0620:551 reply

Each failed prediction should lower our confidence in the next "it's finally useful!" claim. But this inductive reasoning breaks down at genuine inflection points.

I agree with your framing that measuring should NOT be separated from political issues, but each can be made clear separately (framing it as "training the tools of the oppressor" seems to conflate measuring tool usefulness with politics).

By biammer 2026-01-0621:293 reply

[flagged]

By mikestorrent 2026-01-0622:531 reply

> How is it useful to you that these companies are so valuation hungry that they are moving money into this technology in such a way that people are fearful it could cripple the entire global economy?

The creation of entire new classes of profession has always been the result of technological breakthroughs. The automobile did not cripple the economy, even as it ended the buggy-whip barons.

> How is it useful to you that this tech is so power hungry that environmental externalities are being further accelerated while regular people's utility costs are raising to cover the increased demand(whether they use the tech to "code" or "manifest art")?

There will be advantages to lower-power computing, and lower-cost electricity. Implement carbon taxes and AI companies will follow the market incentive to install their datacentres in places where sustainable power is available for cheap. We'll see China soaring to new heights with their massive solar investment, and America will eventually figure out they have to catch up and cannot do so with coal and gas.

> How is it useful to you that this tech is so compute hungry that they are seemingly ending the industry of personal compute to feed this tech's demand?

Temporary problem, the demand for personal computing is not going to die in five years, and meanwhile the lucrative markets for producing this equipment will result in many new factories, increasing capacity and eventually lowering prices again. In the meantime, many pundits are suggesting that this may thankfully begin the end of the Electron App Era where a fuckin' chat client thinks it deserves 1GB of RAM.

Consider this: why are we using Electron and needing 32GB of RAM on a desktop? Because web developers only knew how to use Javascript and couldn't write a proper desktop app. With AI, desktop frameworks can have a resurgence; why shouldn't I use Go or Rust and write a native app on all platforms now that the cost of doing so is decreasing and the number of people empowered to work with it is increasing? I wrote a nice multithreaded fractal renderer in Rust the other day; I don't know how to multithread, write Rust, and probably can't iterate complex numbers correctly on paper anymore....

> How is it useful to you that this tech is so water hungry that it is emptying drinking water acquifers?

This is only a problem in places that have poor water policy, e.g. California (who can all thank the gods that their reservoirs are all now very full from the recent rain). This problem predates datacenters and needs to be solved - for instance, by federalizing and closing down the so-called Wonderful Company and anyone else who uses underhanded tactics to buy up water rights to grow crops that shouldn't be grown there.

Come and run your datacenters up in the cold North, you won't even need evaporative cooling for them, just blow a ton of fresh air in....

> How is it useful to you that this tech is being used to manufacture consent?

Now you've actually got an argument, and I am on your side on this one.

By biammer 2026-01-0623:08

[dead]

By ben_w 2026-01-0622:191 reply

> If at any point any of these releases were "genuine inflection points" it would be unnecessary to proselytize such. It would be self evident. Much like rain.

Agreed.

Now, I suggest reading through all of this to note that I am not a fan of tech bros, that I do want this to be a bubble. Then also note what else I'm saying despite all that.

To me, it is self-evident. The various projects I have created by simply asking for them, are so. I have looked at the source code they produce, and how this has changed over time: Last year I was describing them as "junior" coders, by which I meant "fresh hire"; now, even with the same title, I would say "someone who is just about to stop being a junior".

> "The oppressed need to acknowledge that their oppression is useful to their oppressors."

The capacity for AI to oppress you is in direct relation to its economic value.

The power hunger is in direct proportion to the demand. Someone burning USD 20 to get Claude Code tokens has consumed approximately USD 10 of electricity in that period, with the other USD 10 having been spread between repaying the model training cost and the server construction cost.

The reason they're willing to spend USD 20 is to save at least US 20 worth of dev time. This was already the case with the initial version of ChatGPT pro back in the day, when it could justify that by saving 23 dev minutes per month. There's around a million developers in the USA, just that group increasing electricity spending by USD 10/month will put a massive dent on the USA's power grid.

Gets worse though. Based on my experience, using Claude Code optimally, when you spend USD 20 you get at least 10 junior sprints' worth of output. Hiring a junior for 10 sprints is, what, USD 30,000? The bound here is "are you able to get value from having hired 1,500 juniors for the price of one?"

One can of course also waste those tokens. Both because nobody needs slop, and because most people can't manage one junior never mind 1500 of them.

However, if the economy collectively answers "yes", then the environmental externalities expand until you can't afford to keep your fridge cold or your lights on.

This is one of the failure modes of the technological singularity that people like me have been forewarning about for years, even when there's no alignment issues within the models themselves. Which there are, because Musk's one went and called itself Mecha Hitler, while being so sycophantic about Musk himself that it called him the best at everything even when the thing was "drinking piss", which would be extremely funny if he wasn't selling this to the US military.

> How is it useful to you that this tech is so compute hungry that they are seemingly ending the industry of personal compute to feed this tech's demand?

This will pass. Either this is a bubble, it pops, the manufacturers return to their roots; or it isn't because it works as advertised, which means it leads to much higher growth rates, and we (us, personally, you and me) get personal McKendree cylinders each with more compute than currently exists… or we get turned into the raw materials for those cylinders.

I assume the former. But I say that as one who wants it to be the former.

> How is it useful to you that this tech is so water hungry that it is emptying drinking water acquifers?

Is it what's emptying drinking water acquifers?

The combined water usage of all data centers in Arizona. All of them. Together. Which is over 100 DCs. All of them combined use about double what Tesla was expecting from just the Brandenburg Gigafactory to use before Musk decided to burn his reputation with EV consumers and Europeans for political point scoring.

> How is it useful to you that this tech is being used to manufacture consent?

This is one of the objectively bad things, though it's hard to say if this is more or less competent at this than all the other stuff we had three years ago, given the observed issues with the algorithmic feeds.

By biammer 2026-01-0622:571 reply

I appreciate you taking the time to write up your thoughts on something other than exclusively these tools 'usefulness' at writing code.

> The capacity for AI to oppress you is in direct relation to its economic value.

I think this assumes a level of rationality in these systems, corporate interests and global markets, that I would push back on as being largely absent.

> The power hunger is in direct proportion to the demand.

Do you think this is entirely the case? I mean, I understand what you are saying, but I would draw stark lines between "company" demand versus "user" demand. I have found many times the 'AI' tools are being thrust into nearly everything regardless of user demand. Spinning its wheels to only ultimately cause frustration. [0]

> Is it what's emptying drinking water aquifers?

It appears this is a problem, and will only continue to be such. [1]

> The combined water usage of all data centers in Arizona. All of them. Together. Which is over 100 DCs. All of them combined use about double what Tesla was expecting from just the Brandenburg Gigafactory to use before Musk decided to burn his reputation with EV consumers and Europeans for political point scoring.

I am unsure if I am getting what your statements here are trying to say. Would you be able to restate this to be more explicit in what you are trying to communicate.

[0] https://news.ycombinator.com/item?id=46493506

[1] https://www.forbes.com/sites/cindygordon/2024/02/25/ai-is-ac...

By ben_w 2026-01-0623:591 reply

> I think this assumes a level of rationality in these systems, corporate interests and global markets, that I would push back on as being largely absent.

Could be. What I hope and suspect is happening is that these companies are taking a real observation (the economic value that I also observe in software) and falsely expanding this to other domains.

Even to the extent that these work, AI has clearly been over-sold in humanoid robotics and self-driving systems, for example.

> Do you think this is entirely the case? I mean, I understand what you are saying, but I would draw stark lines between "company" demand versus "user" demand. I have found many times the 'AI' tools are being thrust into nearly everything regardless of user demand. Spinning its wheels to only ultimately cause frustration. [0]

I think it is. Companies setting silly goals like everyone must use LLMs once a day or whatever, that won't burn a lot of tokens. Claude Code is available in both subscription mode and PAYG mode, and the cost of subscriptions suggests it is burning millions of tokens a month for the basic subscription.

Other heavy users who we would both agree are bad, are slop content farms. I cannot even guesstimate those, so would be willing to accept the possibility they're huge.

> It appears this is a problem, and will only continue to be such. [1]

I find no reference to "aquifers" in that.

Where it says e.g. "up to 9 liters of water to evaporate per kWh of energy used", the average is 1.9 l/kWh. Also, evaporated water tends to fall nearby (on this scale) as rain, so unless there's now too much water on the surface, this isn't a net change even if it all comes form an aquifer (and I have yet to see any evidence of DCs going for that water source).

It says "The U.S. relies on water-intensive thermoelectric plants for electricity, indirectly increasing data centers' water footprint, with an average of 43.8L/kWh withdrawn for power generation." - most water withdrawn is returned, not consumed.

It says "Already AI's projected water usage could hit 6.6 billion m³ by 2027, signaling a need to tackle its water footprint.", this is less than the famously-a-desert that is Arizona.

> I am unsure if I am getting what your statements here are trying to say. Would you be able to restate this to be more explicit in what you are trying to communicate.

That the water consumption of data centres is much much smaller than the media would have you believe. It's more of a convenient scare story than a reality. If water is your principal concern, give up beef, dairy, cotton, rice, almonds, soy, biofuels, mining, paper, steel, cement, residential lawns, soft drinks, car washing, and hospitals, in approximately that order (assuming the lists I'm reading those from are not invented whole cloth), before you get to data centres.

And again, I don't disagree that they're a problem, it's just that the "water" part of the problem is so low down the list of things to worry about as to be a rounding error.

By biammer 2026-01-070:32

> I find no reference to "aquifers" in that.

Ahh, I see your objection now. That is my bad. I was using my language too loosely. Here I was using 'aquifer' to mean 'any source of drinking water', but that is certainly different from the intended meaning.

> And again, I don't disagree that they're a problem, it's just that the "water" part of the problem is so low down the list of things to worry about as to be a rounding error.

I'm skeptical of the rounding error argument, and weary of relying on the logical framework of 'low down the list' when list items' effects stack interdependently.

> give up beef, dairy, cotton, rice, almonds, soy, biofuels, mining, paper, steel, cement, residential lawns, soft drinks, car washing, and hospitals

In part due to this reason, as well as others, I have stopped directly supporting the industries for: beef, dairy, rice, almonds, soy, biofuels, residential lawns, soft drinks, car washing

By QuantumGood 2026-01-0621:44

The hype curve is a problem, but it's difficult to prevent. I myself have never made such a prediction. Though it now seems that the money and effort to create working coding tools is near an inflection point.

"It would be self evident." History shows the opposite at inflection points. The "self evident" stage typically comes much later.

By spaceman_2020 2026-01-0621:424 reply

It's a little weird how defensive people are about these tools. Did everyone really think being able to import a few npm packages, string together a few APIs, and run npx create-react-app was something a large number of people could do forever?

The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.

Every profession changes. Saying that these new tools are useless or won't impact you/xyz devs is just ignoring a repeated historical pattern

By stefan_ 2026-01-0622:521 reply

They made the "abstracted away the CRUD app", it's called Salesforce. Hows that going?

By simonw 2026-01-0623:47

It's employing so may people who specialize in Salesforce configuration that every year San Francisco collapses under the weight of 50,000+ of them attending Dreamforce.

And it's actually kind of amazing, because a lot of people who earn six figures programming Salesforce came to it from a non-traditional software engineering background.

By mikestorrent 2026-01-0622:452 reply

I think perhaps for some folks we're looking at their first professional paradigm shift. If you're a bit older, you've seen (smaller versions of) the same thing happening before as e.g. the Internet gained traction, Web2.0, ecommerce, crypto, etc. and have seen your past skillset become useless as now it can be accomplished for only $10/mo/user.... either you pivot and move on somehow, or you become a curmudgeon. Truly, the latter is optional, and at any point when you find yourself doing that you wish to stop and just embrace the new thing, you're still more than welcome to do so. AI is only going to get EASIER to get involved with, not harder.

By wiml 2026-01-0623:381 reply

And by the same token (ha) for some folks we're looking at their first hype wave. If you're a bit older, you've seen similar things like 4GLs and visual programming languages and blockchain and expert systems. They each left their mark on our profession but most of their promises were unfounded and ultimately unrealized.

By mikestorrent 2026-01-0919:34

I like a lot of 4GL ideas. Closest I've come was working on ServiceNow which is sort of a really powerful system with ugly, ugly roots but the idea of your code being the database being the code really resonated with me, as a self-taught programmer.

Similarly, Lisp's homoiconicity makes sense to me as a wonderfully aesthetic idea. I remember generating strings-of-text that were code, but still just text, and wishing that I could trivially step into the structure there like it was a map/dict... without realizing that that's what an AST is and what the language compiler / runtime is already always doing.

By troupo 2026-01-0622:581 reply

Lol. In a few years when the world is awash in AI-generated slop [1] my "past skills" will not only be relevant, they will be actively sought after.

[1] Like the recent "Gas Town" and "Beads" that people keep mentioning in the comments that require extensive scripts/human intervention to purge from the system: https://news.ycombinator.com/item?id=46510121

By mikestorrent 2026-01-0919:34

I'm probably the same age as you, and similarly counting on past skills - it's what lets me use AI to produce things that aren't slop.

By idiotsecant 2026-01-0622:09

Agreed, it always seemed a little crazy that you could make wild amounts of money to just write software. I think the music is finally stopping and we'll all have to go back to actually knowing how to do something useful.

By ben_w 2026-01-0623:051 reply

> The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.

My experience has been negative progress in this field. On iOS, UIKit in Interface Builder is an order of magnitude faster to write and to debug, with less weird edge cases, than SwiftUI was last summer. I say last summer because I've been less and less interested in iOS the more I learn about liquid glass, even ignoring the whole "aaaaaaa" factor of "has AI made front end irrelevant anyway?" and "can someone please suggest something the AI really can't do so I can get a job in that?"

By marcosdumay 2026-01-070:55

The 80s TUI frameworks are still not beaten in developer productivity buy GUI or web frameworks. They have been beaten by GUIs in usability, but then the GUIs reverted into a worse option.

Too bad they were mostly proprietary and won't even run in modern hardware.

By square_usual 2026-01-0621:25

You're free to not open these threads, you know!

By Workaccount2 2026-01-0620:255 reply

Democratizing coding so regular people can get the most out of computers is the opposite of oppression. You are mistaking your interests for societies interests.

It's the same with artists who are now pissed that regular people can manifest their artistic ideas without needing to go through an artist or spend years studying the craft. The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.

It's incredibly ironic how socializing what was a privatized ability has otherwise "socialist" people completely losing their shit. Just the mask of pure virtue slipping...

By deergomoo 2026-01-0622:022 reply

On what planet is concentrating an increasingly high amount of the output of this whole industry on a small handful of megacorps “democratising” anything?

Software development was already one of the most democratised professions on earth. With any old dirt cheap used computer, an internet connection, and enough drive and curiosity you could self-train yourself into a role that could quickly become a high paying job. While they certainly helped, you never needed any formal education or expensive qualifications to excel in this field. How is this better?

By Workaccount2 2026-01-0622:461 reply

Open/local models are available.

Maybe not as good, but they can certainly do far far more than what was available a few years ago.

By bsder 2026-01-0623:221 reply

The open models don't have access to all the proprietary code that the closed ones have trained on.

That's primarily why I finally had to suck it up and sign up for Claude. Claude clearly can cough up proprietary codebase examples that I otherwise have no access to.

By simonw 2026-01-0623:25

Given that very few of the "open models" disclose their training data there's no reason at all to assume that the proprietary models have an advantage in terms of training on proprietary data.

As far as I can tell the reason OpenAI and Anthropic are ahead in code is that they've invested extremely heavily in figuring out the right reinforcement learning training mix needed to get great coding results.

Some of the Chinese open models are already showing signs of catching up.

By simonw 2026-01-0623:211 reply

It's better because now you can automate something tedious in your life with a computer without having to first climb a six month learning curve.

By biammer 2026-01-071:271 reply

> deergomoo: On what planet is concentrating an increasingly high amount of the output of this whole industry on a small handful of megacorps “democratising” anything?

> simonw: It's better because now you can automate something tedious in your life with a computer without having to first climb a six month learning curve.

Completely ignores, or enthusiastically accepts and endorses, the consolidation of production, power, and wealth into a stark few (friends), and claims superiority and increased productivity without evidence?

This may be the most simonw comment I have ever seen.

By simonw 2026-01-077:10

At the tail end of 2023 I was deeply worried about consolidation of power, because OpenAI were the only lab with a GPT-4 class model and none of their competitions had produced anything that matched it in the ~8 months since it had launched.

I'm not worried about that at all any more. There are dozens of organizations who have achieved that milestone now, and OpenAI aren't even definitively in the lead.

A lot of those top-class models are open weight (mainly thanks to the Chinese labs) and available for people to run on their own hardware.

I wrote a bunch more about this in my 2024 wrap-up: https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-gpt-...

By spaceman_2020 2026-01-0621:45

I used claude code to set up a bunch of basic tools my wife was using in her daily work. Things like custom pomodoro timers, task managers, todo notes.

She used to log into 3 different websites. Now she just opens localhost:3000 and has all of them on the same page. No emails shared with anyone. All data stored locally.

I could have done this earlier but the time commitment with Claude Code now was writing a spec in 5-minutes and pressing approve a few times vs half a day.

I count this as an absolute win. No privacy breaches, no data sharing.

By spacechild1 2026-01-071:33

> The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.

Tt's because these companies profit from all the existing art without compensating the artists. Even worse, they are now putting the very people out of a job who (unwittingly) helped to create these tools in the first place. Not to mention how hurtful it must be for artists seeing their personal style imitated by a machine without their consent.

I totally see how it can empower regular people, but it also empowers the megacorps and bad actors. The jury is still out on whether AI is providing a net positive to society. Until then, let's not ignore the injustice and harm that went into creating these tools and the potential and real dangers that come with it.

By biammer 2026-01-0621:101 reply

When you imagine my position, "I hate these companies for democratizing code/art", then debate that it is called a strawman logical fallacy.

Ascribing the goals of "democratize code/art" onto these companies and their products is called delusion.

I am sure the 3 letter agency directors on these company boards are thrilled you think they left their lifelong careers solely to finally realize their dream to allow you to code and "manifest your artistic ideas".

By Workaccount2 2026-01-0622:491 reply

Again, open models exist. These companies don't have a monopoly on the tech and they know it.

So maybe celebrate open/private/local models for empowering people rather than selfishly complain about it?

By icedchai 2026-01-070:04

Yes, but the quality of output from open/local models isn't anywhere close to what you get from Claude or Gemini. You need serious hardware to get anything approaching decent processing speeds or even middling quality.

It's more economical for the average person to spend $20/month on a subscription than it is for them to drop multiple thousands $ and untold hours of time experimenting. Local AI is a fun hobby though.

By elzbardico 2026-01-0623:16

But people are not creating anything. They are just asking a computer to remix what other people created.

It's incredibly ironic how blatant theft has left otherwise capitalistic people so enthusiastic.

By Aurornis 2026-01-0621:272 reply

> If I am unable to convince you to stop meticulously training the tools of the oppressor (for a fee!) then I just ask you do so quietly.

I'm kind of fascinated by how AI has become such a culture war topic with hyperbole like "tools of the oppressor"

It's equally fascinating how little these comments understand about how LLMs work. Using an LLM for inference (what you do when you use Claude Code) does not train the LLM. It does not learn from your code and integrate it into the model while you use it for inference. I know that breaks the "training the tools of the oppressor" narrative which is probably why it's always ignored. If not ignored, the next step is to decry that the LLM companies are lying and are stealing everyone's code despite saying they don't.

By meowkit 2026-01-0621:361 reply

We are not talking about inference.

The prompts and responses are used as training data. Even if your provider allows you to opt out they are still tracking your usage telemetry and using that to gauge performance. If you don’t own the storage and compute then you are training the tools which will be used to oppress you.

Incredibly naive comment.

By Aurornis 2026-01-0621:401 reply

> The prompts and responses are used as training data.

They show a clear pop-up where you choose your setting about whether or not to allow data to be used for training. If you don't choose to share it, it's not used.

I mean I guess if someone blindly clicks through everything and clicks "Accept" without clicking the very obvious slider to turn it off, they could be caught off guard.

Assuming everyone who uses Claude is training their LLMs is just wrong, though.

Telemetry data isn't going to extract your codebase.

By lukan 2026-01-0622:104 reply

"If you don't choose to share it, it's not used"

I am curious where your confidence that this is true, is coming from?

Besides lots of GPU's, training data seems the most valuable asset AI companies have. Sounds like strong incentive to me to secretly use it anyway. Who would really know, if the pipelines are set up in a way, if only very few people are aware of this?

And if it comes out "oh gosh, one of our employees made a misstake".

And they already admitted to train with pirated content. So maybe they learned their lesson .. maybe not, as they are still making money and want to continue to lead the field.

By simonw 2026-01-070:372 reply

My confidence comes from the following:

1. There are good, ethical people working at these companies. If you were going to train on customer data that you had promised not to train on there would be plenty of potential whistleblowers.

2. The risk involved in training on customer data that you are contractually obliged not to train on is higher than the value you can get from that training data.

3. Every AI lab knows that the second it comes out that they trained on paying customer data saying they wouldn't, those paying customers will leave for their competitors (and sue them int the bargain.)

4. Customer data isn't actually that valuable for training! Great models come from carefully curated training data, not from just pasting in anything you can get your hands on.

Fundamentally I don't think AI labs are stupid, and training on paid customer data that they've agreed not to train on is a stupid thing to do.

By RodgerTheGreat 2026-01-071:281 reply

1. The people working for these companies are already demonstrably ethically flexible enough to pirate any publicly accessible training data they can get their hands on, including but not limited to ignoring the license information in every repo on GitHub. I'm not impressed with any of these clowns and I wouldn't trust them to take care of a potted cactus.

2. The risk of using "illegal" training data is irrelevant, because no GenAI vendors have been meaningfully punished for violating copyright yet, and in the current political climate they don't expect to be anytime soon. Even so,

3. Presuming they get caught redhanded using personal data without permission- which, given the nature of LLMs would be extremely challenging for any individual customer to prove definitively- they may lose customers, and customers may try to sue, but you can expect those lawsuits to take years to work their way through the courts; long after these companies IPO, employees get their bag, and it all becomes someone else's problem.

4. The idea of using carefully curated datasets is popular rhetoric, but absolutely does not reflect how the biggest GenAI vendors do business. See (1).

AI labs are extremely shortsighted, sloppy, and demonstrably do not care a single iota about the long term when there's money to be made in the short term. Employees have gigantic financial incentives to ignore internal malfeasance or simple ineptitude. The end result is, if anything, far worse than stupidity.

By simonw 2026-01-076:251 reply

There is an important difference between openly training on scraped web data and license-ignored data from GitHub and training on data from your paying customers that you promised you wouldn't train on.

Anthropic had to pay $1.5bn after being caught downloading pirated ebooks.

By lunar_mycroft 2026-01-077:322 reply

So Anthropic had to pay less than 1% of their valuation despite approximately their entire business being dependent on this and similar piracy. I somehow doubt their takeaway from that is "let's avoid doing that again".

By ben_w 2026-01-0811:28

Two things:

First: Valuations are based on expected future profits.

For a lot of companies, 1% of valuation is ~20% of annual profit (P/E ratio 5); for fast growing companies, or companies where the market is anticipating growth, it can be a lot higher. Weird outlier example here, but consider that if Tesla was fined 1% of its valuation (1% of 1.5 trillion = 15 billion), that would be most of the last four quarter's profit on https://www.macrotrends.net/stocks/charts/TSLA/tesla/gross-p...

Second: Part of the Anthropic case was that many of the books they trained on were ones they'd purchased and destructively scanned, not just pirated. The courts found this use was fine, and Anthropic had already done this before being ordered to: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

By simonw 2026-01-077:44

Their main takeaway was that they should legally buy paper books, chop the spines off and scan those for training instead.

By lunar_mycroft 2026-01-073:57

Every single point you made is contradicted by the observed behavior of the AI labs. If any of those factors were going to stop them from training on data they legally can't, they would have done so already.

By Aurornis 2026-01-072:041 reply

> I am curious where your confidence that this is true, is coming from?

My confidence comes from working in big startups and big companies with legal teams. There's no way the entire company is going to gather all of the engineers and everyone around, have them code up a secret system to consume customer data into a secret part of the training set, and then have everyone involved keep quiet about it forever.

The whistleblowing and leaking would happen immediately. We've already seen LLM teams leak and and have people try to whistleblow over things that aren't even real, like the Google engineer who thought they had invented AGI a few years ago (lol). OpenAI had a public meltdown when the employees disagreed with Sam Altman's management style.

So my question to you is: What makes you think they would do this? How do you think they'd coordinate the teams to keep it all a secret and only hire people who would take this secret to their grave?

By lukan 2026-01-076:44

"There's no way the entire company is going to gather all of the engineers and everyone around, have them code up a secret system "

No, that is why I wrote

"Who would really know, if the pipelines are set up in a way, that only very few people are aware of this?" (Typo fixed)

There is no need for everyone to know. I don't know their processes, but I can think of ways to only include very few people who need to know.

The rest is just working on everything else. Some work with data, where they don't need to know where it came from, some with UI, some with scaling up, some .. they all don't need to know, that the source of DB XYZ comes from a dark source.

By theshrike79 2026-01-0719:401 reply

> I am curious where your confidence that this is true, is coming from?

We have a legal binding contract with Anthropic. Checked and vetted by our laywers, who are annoying because they actually READ the contracts and won't let us use services with suspicious clauses in them - unless we can make amendments.

If they're found to be in breach of said contract (which is what every paid user of Claude signs), Anthropic is going to be the target of SO FUCKING MANY lawsuits even the infinite money hack of AI won't save them.

By lukan 2026-01-0811:551 reply

Are you refering to the standard contract/terms of use, or does your company has a special contract made with them?

By theshrike79 2026-01-1021:06

Usually we have the standard contract if Legal approves.

We have stopped using major services because of their TOS wording, Midjourney being one.

By ben_w 2026-01-0623:192 reply

> Besides lots of GPU's, training data seems the most valuable asset AI companies have. Sounds like strong incentive to me to secretly use it anyway. Who would really know, if the pipelines are set up in a way, if only very few people are aware of this?

Could be, but it's a huge risk the moment any lawsuit happens and the "discovery" process starts. Or whistleblowers.

They may well take that risk, they're clearly risk-takers. But it is a risk.

By yunwal 2026-01-070:122 reply

Eh they’re all using copyrighted training data from torrent sites anyway. If the government was gonna hold them accountable for this it would have happened already.

By ragequittah 2026-01-070:54

You're probably right [1]

[1]https://www.cbc.ca/news/business/anthropic-ai-copyright-sett...

By ben_w 2026-01-079:27

The piracy was found to be unlawful copyright infringement.

The training was OK, but the piracy wasn't, they were held accountable for that.

By blibble 2026-01-070:282 reply

the US no longer has any form of rule of law

so there's no risk

By Aurornis 2026-01-072:04

> the US no longer has any form of rule of law

AI threads really bring out the extreme hyperbole and doomerism.

By ben_w 2026-01-079:28

The USA is a mess that's rapidly getting worse, but it has not yet fallen that far.

By biammer 2026-01-0621:481 reply

I understand how these LLMs work.

I find it hard to believe there are people who know these companies stole the entire creative output of humanity and egregiously continually scrape the internet are, for some reason, ignoring the data you voluntarily give them.

> I know that breaks the "training the tools of the oppressor" narrative

"Narrative"? This is just reality. In their own words:

> The awards to Anthropic, Google, OpenAI, and xAI – each with a $200M ceiling – will enable the Department to leverage the technology and talent of U.S. frontier AI companies to develop agentic AI workflows across a variety of mission areas. Establishing these partnerships will broaden DoD use of and experience in frontier AI capabilities and increase the ability of these companies to understand and address critical national security needs with the most advanced AI capabilities U.S. industry has to offer. The adoption of AI is transforming the Department’s ability to support our warfighters and maintain strategic advantage over our adversaries [0]

Is 'warfighting adversaries' some convoluted code for allowing Aurornis to 'see a 1337x in productivity'?

Or perhaps you are a wealthy westerner of a racial and sexual majority and as such have felt little by way of oppression by this tech?

In such a case I would encourage you to develop empathy, or at least sympathy.

> Using an LLM for inference .. does not train the LLM.

In their own words:

> One of the most useful and promising features of AI models is that they can improve over time. We continuously improve our models through research breakthroughs as well as exposure to real-world problems and data. When you share your content with us, it helps our models become more accurate and better at solving your specific problems and it also helps improve their general capabilities and safety. We do not use your content to market our services or create advertising profiles of you—we use it to make our models more helpful. ChatGPT, for instance, improves by further training on the conversations people have with it, unless you opt out.

[0] https://www.ai.mil/latest/news-press/pr-view/article/4242822...

[1] https://help.openai.com/en/articles/5722486-how-your-data-is...

By ben_w 2026-01-0623:281 reply

> Is 'warfighting adversaries' some convoluted code for allowing Aurornis to 'see a 1337x in productivity'?

Much as I despair at the current developments in the USA, and I say this as a sexual minority and a European, this is not "tools of the oppressor" in their own words.

Trump is extremely blunt about who he wants to oppress. So is Musk.

"Support our warfighters and maintain strategic advantage over our adversaries" is not blunt, it is the minimum baseline for any nation with assets anyone else might want to annex, which is basically anywhere except Nauru, North Sentinel Island, and Bir Tawil.

By biammer 2026-01-070:512 reply

> "Support our warfighters and maintain strategic advantage over our adversaries" is not blunt, it is the minimum baseline for any nation with assets anyone else might want to annex

I think its gross to distill military violence as defending 'assets [others] might want to annex'.

What US assets were being annexed when US AI was used to target Gazans?

https://apnews.com/article/israel-palestinians-ai-technology...

> Trump is extremely blunt about who he wants to oppress. So is Musk.

> our adversaries" is not blunt

These two thoughts seem at conflict.

What 'assets' were being protected from annexation here by this oppressive use of the tool? The chips?

https://www.aclu.org/news/privacy-technology/doritos-or-gun

By ben_w 2026-01-078:44

> I think its gross to distill military violence as defending 'assets [others] might want to annex'.

Yes, but that's how the world works:

Another country wants a bit of your country for some reason, they can take it by force unless you can make at the very least a credible threat against them, sometimes a lot more than that.

Note that this does not exclude that there has to be an aggressor somewhere. I'm not excluding the existence of aggressors, nor the capacity for the USA to be an aggressor. All I'm saying is your quotation is so vague as to also encompass those who are not.

> What US assets were being annexed when US AI was used to target Gazans?

First, I'm saying the statement is so broad as to encompass other things besides being a warmonger. Consider the opposite statement: "don't support our warfighters and don't maintain strategic advantage over our adversaries" would be absolutely insane, therefore "support our warfighters and maintain strategic advantage over our adversaries" says nothing.

Second, in this case the country doing the targeting is… Israel. To the extent that the USA cares at all, it's to get votes from the large number of Jewish people living in the USA. Similar deal with how it treats Cuba since the fall of the USSR: it's about votes (from Cuban exiles in that case, but still, votes).

Much as I agree that the conduct of Israel with regard to Gaza was disproportionate, exceeded the necessity, and likely was so bad as to even damage Israel's long-term strategic security, if you were to correctly imagine the people of Israel deciding "don't support our warfighters and don't maintain strategic advantage over our adversaries", they would quickly get victimised much harder than those they were victimising. That's the point there: the quote you cite as evidence, is so broad that everyone has approximately that, because not having it means facing ones' own destruction.

There's a mis-attributed quote, "People sleep peaceably in their beds at night because rough men stand ready to do violence on their behalf", that's where this is at.

> These two thoughts seem at conflict.

Musk is openly and directly saying "Canada is not a real country.", says "cis" is hate speech, response to pandemic was tweeting "My pronouns are Prosecute/Fauci.", and self-justification for his trillion dollar bonus for hitting future targets is wanting to be in control of what he describes as a "robot army"; Trump openly and explicitly wants the USA to annex Canada, Greenland, Panama canal, is throwing around the national guard, openly calls critics traitors and calls for death penalty. They're a subtle as exploding volcanoes, nobody needs to take the worst case interpretations of what they're saying to notice this.

Saying "support our warfighters" is something done by basically every nation everywhere all the time, because those places that don't do this quickly get taken over by nearby nations who sense weakness. Which is kinda how the USA got Texas, because again, I'm not saying the USA is harmless, I'm saying the quote doesn't show that.

> What 'assets' were being protected from annexation here by this oppressive use of the tool? The chips?

This would have been a much better example to lead with than the military stuff.

I'm absolutely all on board with the general consensus that the US police are bastards in this specific way, have been since that kid got shot for having a toy gun in an open-carry state. (I am originally from a country where even the police are not routinely armed, I do not value the 2nd amendment, but if you're going to say "we allow open carry of firearms" you absolutely do not get to use "we saw someone carrying a firearm" as an excuse to shoot them).

However: using LLMs to code doesn't seem to be likely to make a difference either way for this. If I was writing a gun-detection AI, perhaps I'm out of date, but I'd use a simpler model that runs locally on-device and doesn't do anything else besides the sales pitch.

By cindyllm 2026-01-071:15

[dead]

By Gud 2026-01-0619:322 reply

Frankly, in this comment thread you appear to be the oppressor.

By goatlover 2026-01-0621:37

Who is the parent oppressing? Making a comment and companies looking to automate labor are a little bit different. One might disagree that automation is oppressive or whatever goals the major tech CEOs have in developing AIs (surveillance, influencing politics, increasing wealth gap), but certainly commenting that they are oppressive is not the same thing.

By biammer 2026-01-0620:061 reply

[flagged]

By santoshalper 2026-01-0622:012 reply

[flagged]

By biammer 2026-01-071:54

> Why are you afraid of using your real account

Careful with being blindly led by your own assumptions.

I actually disagree with your thesis here. I think if every comment was posted under a new account this site would improve its average veracity.

As it stands certain 'celebrity', or high karma, accounts are artificially bolstered by the network effect indifferent to the defensibility of their claims.

By justinclift 2026-01-0623:11

Please don't go down the path of making personal attacks.

By animegolem 2026-01-070:19

I know someone who is using a vibe coded or at least heavily assisted text editor, praising it daily, while also saying llms will never be productive. There is a lot of dissonance right now.

By enum 2026-01-0618:463 reply

I teach at a university, and spend plenty of time programming for research and for fun. Like many others, I spent some time on the holidays trying to push the current generation of Cursor, Claude Code, and Codex as far as I could. (They're all very good.)

I had an idea for something that I wanted, and in five scattered hours, I got it good enough to use. I'm thinking about it in a few different ways:

1. I estimate I could have done it without AI with 2 weeks full-time effort. (Full-time defined as >> 40 hours / week.)

2. I have too many other things to do that are purportedly more important that programming. I really can't dedicate to two weeks full-time to a "nice to have" project. So, without AI, I wouldn't have done it at all.

3. I could hire someone to do it for me. At the university, those are students. From experience with lots of advising, a top-tier undergraduate student could have achieved the same thing, had they worked full tilt for a semester (before LLMs). This of course assumes that I'm meeting them every week.

By realusername 2026-01-0622:421 reply

This is where the LLM coding shines in my opinion, there's a list of things they are doing very well:

- single scripts. Anything which can be reduced to a single script.

- starting greenfield projects from scratch

- code maintenance (package upgrades, old code...)

- tasks which have a very clear and single definition. This isn't linked to complexity, some tasks can be both very complex but with a single definition.

If your work falls into this list they will do some amazing work (and yours clearly fits that), if it doesn't though, prepare yourself because it will be painful.

By enum 2026-01-0622:561 reply

I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code. I've done enough of that with LLMs, though not in large codebases.

I should say I'm hardly ever vibe-coding, unlike the original article. If I think I want code that will last, I'll steer the models in ways that lean on years of non-LLM experience. E.g., I'll reject results that might work if they violate my taste in code.

It also helps that I can read code very fast. I estimate I can read code 100x faster than most students. I'm not sure there is any way to teach that other than the old-fashioned way, which involves reading (and writing) a lot of code.

By realusername 2026-01-072:39

> I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code

Yes indeed, these are the things on the other hand which aren't working well in my opinion:

- large codebase

- complex domain knowledge

- creating any feature where you need product insights

- tasks requiring choices (again, complexity doesn't matter here, the task may be simple but require some choices)

- anything unclear where you don't know where you are going first

While you don't experience any of these when teaching or side projects, these are very common in any enterprise context.

By vercaemert 2026-01-0618:596 reply

How do you compare Claude Code to Cursor? I'm a Cursor user quietly watching the CC parade with curiosity. Personally, I haven't been able to give up the IDE experience.

By kaydub 2026-01-0718:09

Im so sold on the cli tools that I think IDEs are basically dead to me. I only have an IDE open so I can read the code, but most often I'm just changing configs (like switching a bool, or bumping up a limit or something like that).

Seriously, I have 3+ claude code windows open at a time. Most days I don't even look at the IDE. It's still there running in the background, but I don't need to touch it.

By lizardking 2026-01-071:16

When I'm using Claude Code, I usually have a text editor open as well. The CC plugin works well enough to achieve most of what Cursor was doing for me in showing real-time diffs, but in my experience, the output is better and faster. YMMV

By tstrimple 2026-01-0716:44

I use CC for so much more than just writing code that I cannot imagine being constrained within an IDE. Why would I want to launch an IDE to have CC update the *arr stack on my NAS to the latest versions for example? Last week I pointed CC at some media files that weren't playing correctly on my Apple TV. It detected what the problem formats were and updated my *arr download rules to prefer other releases and then configured tdarr to re-encode problem files in my existing library.

By subomi 2026-01-076:00

I was here a few weeks ago, but I'm now on the CC train. The challenge is that the terminal is quite counterintuitive. But if you put on the Linux terminal lens from a few years ago, and you start using it. It starts to make sense. The form factor of the terminal isn't intuitive for programming, but it's the ultimate.

FYI, I still use cursor for small edits and reviews.

By enum 2026-01-0619:02

I don't think I can scientifically compare the agents. As it is, you can use Opus / Codex in Cursor. The speed of Cursor composer-1 is phenomenal -- you can use it interactively for many tasks. There are also tasks that are not easier to describe in English, but you can tab through them.

By smw 2026-01-073:43

Just FYI, these days cc has 'ide integration' too, it's not just a cli. Grab the vscode extension.

By franktankbank 2026-01-0622:571 reply

What did you build? I think people talk passed eachother when people don't share what exactly they were trying to do and achieving success/failure.

By enum 2026-01-0623:02

Referring to this: https://github.com/arjunguha/slopcoder

I then proceeded to use it to hack on its own codebase, and close a bunch of issues in a repository that I maintain (https://github.com/nuprl/MultiPL-E/commits/main/).

By TacticalCoder 2026-01-0623:193 reply

> Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.

Nobody is sleeping. I'm using LLMs daily to help me in simple coding tasks.

But really where is the hurry? At this point not a few weeks go by without the next best thing since sliced bread to come out. Why would I bother "learning" (and there's really nothing to learn here) some tool/workflow that is already outdated by the time it comes out?

> 2026 is going to be a wake-up call

Do you honestly think a developer not using AI won't be able to adapt to a LLM workflow in, say, 2028 or 2029? It has to be 2026 or... What exactly?

There is literally no hurry.

You're using the equivalent of the first portable CD-player in the 80s: it was huge, clunky, had hiccups, had a huge battery attached to it. It was shiny though, for those who find new things shiny. Others are waiting for a portable CD player that is slim, that buffers, that works fine. And you're saying that people won't be able to learn how to put a CD in a slim CD player because they didn't use a clunky one first.

By simonw 2026-01-0623:28

I think getting proficient at using coding agents effectively takes a few months of practice.

It's also a skill that compounds over time, so if you have two years of experience with them you'll be able to use them more effectively than someone with two months of experience.

In that respect, they're just normal technology. A Python programmer with two years of Python experience will be more effective than a programmer with two months of Python.

By vidarh 2026-01-0814:18

> Nobody is sleeping. I'm using LLMs daily to help me in simple coding tasks.

That is sleeping.

> But really where is the hurry? At this point not a few weeks go by without the next best thing since sliced bread to come out. Why would I bother "learning" (and there's really nothing to learn here) some tool/workflow that is already outdated by the time it comes out?

You're jumping to conclusions that haven't been justified by any of the development in this space. The learning compounds.

> Do you honestly think a developer not using AI won't be able to adapt to a LLM workflow in, say, 2028 or 2029? It has to be 2026 or... What exactly?

They will, but they'll be competing against people with 2-3 more years of experience in understanding how to leverage these tools.

By jasonfarnon 2026-01-070:43

"But really where is the hurry?" It just depends on why you're programming. For many of us not learning and using up to date products leads to a disadvantage relative to our competition. I personally would very much rather go back to a world without AI, but we're forced to adapt. I didn't like when pagers/cell phones came out either, but it became clear very quickly not having one put me at a disadvantage at work.

By BatteryMountain 2026-01-076:173 reply

The crazy part is, once you have it setup and adapted your workflow, you start to notice all sorts of other "small" things:

claude can call ssh and do system admin tasks. It works amazingly well. I have 3 VM's, which depends on each other (proxmox with openwrt, adguard, unbound), and claude can prove to me that my dns chains works perfectly, my firewalls are perfect etc as claude can ssh into each. Setting up services, diagnosing issues, auditing configs... you name it. Just awesome.

claude can call other sh scripts on the machine, so over time, you can create a bunch of scripts that lets claude one shot certain tasks that would normally eat tokens. It works great. One script per intention - don't have a script do more than one thing.

claude can call the compiler, run the debug executable and read the debug logs.. in real time. So claude can read my android apps debug stream via adb.. or my C# debug console because claude calls the compiler, not me. Just ask it to do it and it will diagnose stuff really quickly.

It can also analyze your db tables (give it readonly sql access), look at the application code and queries, and diagnose performance issues.

The opportunities are endless here. People need to wake up to this.

By vidarh 2026-01-0813:33

> claude can call ssh and do system admin tasks

Claude set up a Raspberry Pi with a display and conference audio device for me to use as an Alexa replacement tied to Home Assistant.

I gave it an ssh key and gave it root.

Then I told it what I wanted, and it did. It asked for me to confirm certain things, like what I could see on screen, whether I could hear the TTS etc. (it was a bit of a surprise when it was suddenly talking to me while I was minding my own business).

It configured everything, while keeping a meticulous log that I can point it at if I want to set up another device, and eventually turn into a runbook if I need to.

By theshrike79 2026-01-0721:13

I have a /fix-ci-build slash command that instructs Claude how to use `gh` to get the latest build from that specific project's Github Actions and get the logs for the build

In addition there are instructions on how and where to push the possible fixes and how to check the results.

I've yet to encounter a build failure it couldn't fix automatically.

By Loeffelmann 2026-01-0619:232 reply

Why do all these AI generated readmes have a directory structure sections it's so redundant because you know I could just run tree

By sonnig 2026-01-0622:20

It makes me so exhausted trying to read them... my brain can tell immediately when there's so much redundant information that it just starts shutting itself off.

By bakies 2026-01-0619:40

comments? also reading into an agent so the agent doesnt have to tool-call/bash out

By 6177c40f 2026-01-0618:272 reply

I think we're entering a world where programmers as such won't really exist (except perhaps in certain niches). Being able to program (and read code, in particular) will probably remain useful, though diminished in value. What will matter more is your ability to actually create things, using whatever tools are necessary and available, and have them actually be useful. Which, in a way, is the same as it ever was. There's just less indirection involved now.

By wiml 2026-01-0623:453 reply

We've been living in that world since the invention of the compiler ("automatic programming"). Few people write machine code any more. If you think of LLMs as a new variety of compiler, a lot of their shortcomings are easier to describe.

By qwm 2026-01-072:441 reply

My compiler runs on my computer and produces the same machine code given the same input. Neither of these are true with AI.

By wiml 2026-01-0723:07

You can run an LLM locally (and distributed compile systems, where the compiler runs in the cloud, are a thing, too) so that doesn't really produce a distinction between the two.

Likewise, many optimization techniques involve some randomness, whether it's approximating an NP-thorny subproblem, or using PGO guided by statistical sampling. People might disable those in pursuit of reproducible builds, but no one would claim that enabling those features makes GCC or LLVM no longer a compiler. So nondeterminism isn't really the distinguishing factor either.

By bdangubic 2026-01-0623:48

last thing I want is non-deterministic compiler, do not vibe this analogy at all…

By moffkalast 2026-01-0710:15

Finally we've invented a compiler that we can yell at when it gives bullshit errors. I really missed that with gcc.

By pseidemann 2026-01-0618:393 reply

Isn't there more indirection as long as LLMs use "human" programming languages?

By xarope 2026-01-075:20

If you think of the training data, e.g. SO, github etc, then you have a human asking or describing a problem, then the code as the solution. So I suspect current-gen LLMs are still following this model, which means for the forseeable future a human like language prompt will still be the best.

Until such time, of course, when LLMs are eating their own dogfood, in which case they - as has already happened - create their own language, evolve dramatically, and cue skynet.

By 6177c40f 2026-01-0619:011 reply

More indirection in the sense that there's a layer between you and the code, sure. Less in that the code doesn't really matter as such and you're not having to think hard about the minutiae of programming in order to make something you want. It's very possible that "AI-oriented" programming languages will become the standard eventually (at least for new projects).

By recursive 2026-01-0721:36

One benefit of conventional code is that it expresses logic in an unambiguous way. Much of "the minutiae" is deciding what happens in edge cases. It's even harder to express that in a human language than in computer languages. For some domains it probably doesn't matter.

By layer8 2026-01-0623:27

It’s not clear how affordances of programming languages really differ between humans and LLMs.

By Yoric 2026-01-0618:365 reply

You intrigue me.

> have it learn your conventions, pull in best practices

What do you mean by "have it learn your conventions"? Is there a way to somehow automatically extract your conventions and store it within CLAUDE.md?

> For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.

Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?

By ac29 2026-01-0619:281 reply

> What do you mean by "have it learn your conventions"?

I'll give you an example: I use ruff to format my python code, which has an opinionated way of formatting certain things. After an initial formatting, Opus 4.5, without prompting, will write code in this same style so that the ruff formatter almost never has anything to do on new commits. Sonnet 4.5 is actually pretty good at this too.

By UncleMeat 2026-01-0622:153 reply

Isn't this a meaningless example? Formatters already exist. Generating code that doesn't need to be formatted is exactly the same as generating code and then formatting it.

I care about the norms in my codebase that can't be automatically enforced by machine. How is state managed? How are end-to-end tests written to minimize change detectors? When is it appropriate to log something?

By eterm 2026-01-0623:301 reply

Here's an example:

We have some tests in "GIVEN WHEN THEN" style, and others in other styles. Opus will try to match each style of testing by the project it is in by reading adjacent tests.

By vidarh 2026-01-0813:39

The one caveat with this, is that in messy code bases it will perpetuate bad things, unless you're specific about what you want. Then again, human developers will often do the same and are much harder to force to follow new conventions.

By gck1 2026-01-0720:581 reply

The second part is what I'd also like to have.

But I think it should be doable. You can tell it how YOU want the state to be managed and then have it write a custom "linter" that makes the check deterministic. I haven't tried this myself, but claude did create some custom clippy scripts in rust when I wanted to enforce something that isn't automatically enforced by anything out there.

By UncleMeat 2026-01-0721:43

Lints are typically well suited for syntactic properties or some local semantic properties. Almost all interesting challenges in software design and evolution involve nonlocal semantic properties.

By scotty79 2026-01-0717:24

Memes write themselves.

"AI has X"

"We have X at home"

"X at home: x"

By gingersnap 2026-01-0618:422 reply

Starting to use Opus 4.5 I'm reduces instrutions in claude.md and just ask claude to look in the codebase to understand the patterns already in use. Going from prompts/docs to instead having code being the "truth". Show don't tell. I've found this patterns has made a huge leap with Opus 4.5.

By zoilism 2026-01-0623:59

The Ash framework takes the approach you describe.

From the docs (https://hexdocs.pm/ash/what-is-ash.html):

"Model your application's behavior first, as data, and derive everything else automatically. Ash resources center around actions that represent domain logic."

By kaydub 2026-01-0718:14

I feel like I've been doing this since Sonnet 3.5 or Sonnet 4. I'll clone projects/modules/whatever into the working directory and tell claude to check it out. Voila, now it knows your standards and conventions.

By oncallthrow 2026-01-0619:071 reply

When I ask Claude to do something, it independently, without me even asking or instructing it to, searches the codebase to understand what the convention is.

I’ve even found it searching node_modules to find the API of non-public libraries.

By jack_pp 2026-01-0622:102 reply

This sounds like it would take a huge amount of tokens. I've never used agents so could you disclose how much you pay for it?

By garblegarble 2026-01-0623:201 reply

If they're using Opus then it'll be the $100/month Claude Max 5x plan (could be the more expensive 20x plan depending on how intensive their use is). It does consume a lot of tokens, but I've been using the $100/mo plan and get a lot done without hitting limits. It helps to be mindful of context (regularly amending/pruning your CLAUDE.md instructions, clearing context between tasks, sizing your tasks to stay within the Opus context window). Claude Code plans have token limits that work in 5-hour blocks (that start when you send your first token, so it's often useful to prime it as early in the morning as possible).

Claude Code will spawn sub-agents (that often use their cheap Haiki model) for exploration and planning tasks, with only the results imported into the main context.

I've found the best results from a more interactive collaboration with Claude Code. As long as you describe the problem clearly, it does a good job on small/moderate tasks. I generally set two instances of Claude Code separate tasks and run them concurrently (the interaction with Claude Code distracts me too much to do my own independent coding simultaneously like with setting a task for a colleague, but I do work on architecture / planning tasks)

The one manner of taste that I have had to compromise on is the sheer amount of code - it likes to write a lot of code. I have a better experience if I sweat the low-level code less, and just periodically have it clean up areas where I think it's written too much / too repetitive code.

As you give it more freedom it's more prone to failure (and can often get itself stuck in a fruitless spiral) - however as you use it more you get a sense of what it can do independently and what's likely to choke on. A codebase with good human-designed unit & playwright tests is very good.

Crucially, you get the best results where your tasks are complex but on the menial side of the spectrum - it can pay attention to a lot of details, but on the whole don't expect it to do great on senior-level tasks.

To give you an idea, in a little over a month "npx ccusage" shows that via my Claude Code 5x sub I've used 5M input tokens, 1.5M output, 121M Cache Create, 1.7B Cache Read. Estimated pay-as-you-go API cost equivalent is $1500 (N.B. for the tail end of December they doubled everybody's API limits, so I was using a lot more tokens on more experimental on-the-fly tool construction work)

By NiloCK 2026-01-0623:382 reply

FYI Opus is available and pretty usable in claude-code on the $20/Mo plan if you are at all judicious.

I exclusively use opus for architecture / speccing, and then mostly Sonnet and occasionally Haiku to write the code. If my usage has been light and the code isn't too straightforward, I'll have Opus write code as well.

By covibes 2026-01-1012:52

The problem with current approaches is the lack of feedback loops with independent validators that never lose track of the acceptance criteria. That's the next level that will truly allow no-babysitting implementatons that are feature complete and production grade. Check out this repo that offers that: https://github.com/covibes/zeroshot/

By garblegarble 2026-01-0623:421 reply

That's helpful to know, thanks! I gave Max 5x a go and didn't look back. My suspicion is that Opus 4.5 is subsidised, so good to know there's flexibility if prices go up.

By baq 2026-01-077:08

The $20 plan for CC is good enough for 10-20 minutes of opus every 5h and you’ll be out of your weekly limit after 4-5 days if you sleep during the night. I wouldn’t be surprised if Anthropic actually makes a profit here. (Yeah probably not, but they aren’t burning cash.)

By vidarh 2026-01-0813:42

I use the $200/month Claude Code plan, and in the last week I've had it generate about half a million words of documentation without hitting any session limits.

I have hit the weekly limit before, briefly, but that took running multiple sessions in parallel continuously for many days.

By vidarh 2026-01-0813:38

Just ask it to.

/init in Claude Code already automatically extracts a bunch, but for something more comprehensive, just tell it which additional types of things you want it to look for and document.

> Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?

I don't know about the person above, but I tell Claude to write all my skills and agents for me. With some caveats, you can do this iteratively in a single session ("update the X agent, then re-run it. Repeat until it reliably does Y")

By kaydub 2026-01-0718:12

"Claude, clone this repo https://github.com/repo, review the coding conventions, check out any markdown or readme files. This is an example of coding conventions we want to use on this project"

By dmbche 2026-01-0618:142 reply

Oh! An ad!

By savanaly 2026-01-0618:48

The most effective kind of marketing is viral word of mouth from users who love your product. And Claude Code is benefiting from that dynamic.

By OldGreenYodaGPT 2026-01-0618:211 reply

lol does sound like and ad, but is true. Also forgot about hooks use hooks too! I just use voice to text then had claude reword it. Still my real world ideas

By Rapzid 2026-01-0718:51

Exactly what an ad would say.

By majormajor 2026-01-072:05

All of these things work very well IMO in a professional context.

Especially if you're in a place where a lot of time was spent previously revising PRs for best practices, etc, even for human-submitted code, then having the LLM do that for you that saves a bunch of time. Most humans are bad at following those super-well.

There's a lot of stuff where I'm pretty sure I'm up to at least 2x speed now. And for things like making CLI tools or bash scripts, 10x-20x. But in terms of "the overall output of my day job in total", probably more like 1.5x.

But I think we will need a couple major leaps in tooling - probably deterministic tooling, not LLM tooling - before anyone could responsibly ship code nobody has ever read in situations with millions of dollars on the line (which is different from vibe-coding something that ends up making millions - that's a low-risk-high-reward situation, where big bets on doing things fast make sense. if you're already making millions, dramatic changes like that can become high-risk-low-reward very quickly. In those companies, "I know that only touching these files is 99.99% likely to be completely safe for security-critical functionality" and similar "obvious" intuition makes up for the lack of ability to exhaustively test software in a practical way (even with fuzzers and things), and "i didn't even look at the code" is conceding responsibility to a dangerous degree there.)

By keybored 2026-01-0622:11

> (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

Reword? But why not just voice to text alone...

Oh but we all read the partially synthetic ad by this point. Psyche.

By maxkfranz 2026-01-0720:351 reply

> Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.

I agree with this, but I haven't needed to use any advanced features to get good results. I think the simple approach gets you most of the benefits. Broadly, I just have markdown files in the repo written for a human dev audience that the agent can also use.

Basically:

- README.md with a quick start section for devs, descriptions of all build targets and tests, etc. Normal stuff.

- AGENTS.md (only file that's not written for people specifically) that just describes the overall directory structure and has a short step of instructions for the agent: (1) Always read the readme before you start. (2) Always read the relevant design docs before you start. (3) Always run the linter, a build, and tests whenever you make code changes.

- docs/*.md that contain design docs, architecture docs, and user stories, just text. It's important to have these resources anyway, agent or no.

As with human devs, the better the docs/requirements the better the results.

By vidarh 2026-01-0813:491 reply

I'd really encourage you to try using agents for tasks that are repeatable and/or wordy but where most of the words are not relevant for ongoing understanding.

It's a tiny step further, and sub-agents provide a massive benefit the moment you're ready to trust the model even a little bit (relax permissions to not have it prompt you for every little thing; review before committing rather than on every file edit) because they limit what goes into the top level context, and can let the model work unassisted for far longer. I now regularly have it run for hours at a time without stopping.

Running and acting on output from the linter is absolutely an example of that which matters even for much shorter runs.

There's no reason to have all the lint output "polluting" the top level context, nor to have the steps the agent needs to take to fix linter issues that can't be auto-fixed by the linter itself. The top level agent should only need to care about whether the linter run passed or failed (and should know it needs to re-run and possibly investigate if it fails).

Just type /agents, select "Create new agent" and describe a task you often do, and then forget about it (or ask Claude to make changes to it for you)

By maxkfranz 2026-01-0816:29

That's a great point. There are a lot of things you can do to optimise things, and your suggestion is one of the lower hanging fruits.

I was trying to get across the point that today you can get a lot of benefit from minimal setup, even one that's vendor-agnostic. (The steps I outlined work for Codex out of the box, too.)

You're right to point out that the more you refine things, the more you'll get out of the tools. It used to be that you had to do a lot of refinements to start getting good results at all. Now, you can get a lot out of even a basic setup like I outlined, which is great for people who are new users -- or people who tried it before and weren't that impressed but are now giving it another try.

By hoten 2026-01-0618:163 reply

Mind sharing the bill for all that?

By OldGreenYodaGPT 2026-01-0618:192 reply

My company pays for the team Claude code plan which is like $200 a month for each dev. The workflows cost like 10 - 50 cents a PR

By blahblaher 2026-01-0619:164 reply

It will have to quintuple or more to make business sense for Anthropic. Sure, still cheaper than a full time developer, but don't expect it to stay at $200 for a long time. And then, when you explain to your boss how amazing it is, and can do all this work so easily and quickly, it's when your boss start asking the real question: what am I paying you for?

By benjiro 2026-01-0623:54

A programmer, if we use US standards is probably $8000 per month. If you can get 30% more value out of that programmer (trust me, its WAY more then 30%), you gained $2400 of value. If you pay $200, $500, $1000 for that, its still a net positive. Ignoring the salary range of a actual senior...

LLMs do not result in bosses firing people, it results in more projects / faster completed projects, what in turn means more $$$ for a company.

By bonesss 2026-01-0621:502 reply

More fundamentally: assume a 10 to 30% bump in actual productivity, find a niche (editing software, CRUD frameworks, SharePoint 2.0, stock trading, betting, whatever), and assume you had Anthropics billions or openAIs billions or Microsoft’s billions or Googles billions.

Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts.

Lockheed-Martin could be, but isn’t, opening lemonade stands outside their offices… they don’t because of how buying a Ferrari works.

By theshrike79 2026-01-0721:28

> Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts.

For the same reason Microsoft never has and never will chase people for pirating home Windows or Office licenses

When they hit the workforce, or even better, start a company guess which OS and office suite they'll use? Hint: It's not Linux and Openoffice.

Same with Claude's $20 package. It lets devs use it at home and then compare it to the Copilot shit their company is pushing on them. Maybe they either grumble enough to get a Claude license or they're in a position to make the call.

Cheap advertising pretty much.

Worked for me too :) I've paid my own Claude license for over a year at home, grumbled at work and we got a Claude pilot going now - and everyone who's tried it so far isn't going back to Copilot + Sonnet 4.5/GPT5.

By whattheheckheck 2026-01-072:04

They data farming your intelligence

By HDThoreaun 2026-01-078:25

Im not sure about this. What they really need is to get rid of the free tier and widespread adoption. Inference on the $200 plan seems to be profitable right now so they just need more users to amortize training costs.

By senordevnyc 2026-01-072:28

All the evidence suggests that inference is quite profitable actually.

By square_usual 2026-01-0618:39

It's $150, not a huge difference but worth noting that it's not the same ast the 20x Max plan.

By 6177c40f 2026-01-0618:231 reply

Cheaper than hiring another developer, probably. My experience: for a few dollars I was able to extensively refactor a Python codebase in half a day. This otherwise would have taken multiple days of very tedious work.

By blahblaher 2026-01-0619:144 reply

And that's what the C-suite wants to know. Prepare yourself to be replaced in the not so distant future. Hope you have a good "nest" to support yourself when you're inevitably fired.

By benjiro 2026-01-070:001 reply

> Prepare yourself to be replaced in the not so distant future.

Ignoring that this same developer, now has access to a tool, that makes himself a team.

Going independent was always a issue because being a full stack dev, is hard. With LLMs, you have a entire team behind you for making graphics, code, documents, etc... YOU becomes the manager.

We will see probably a lot more smaller teams/single devs making bigger projects, until they grow.

The companies that think they can fire devs, are the same companies that are going to go too far, and burn bridges. Do not forget that a lot of companies are founded on devs leaving a company, and starting out on their own, taking clients with them!

I did that years ago, and it worked for a while but eventually the math does not work out because one guy can only do so much. And when you start hiring, your costs balloon. But with LLMs ... Now your a one man team, ... hiring a second person is not hiring a person to make some graphics or doing more coding. Your hiring another team.

This is what people do not realize... they look too much upon this as the established order, ignoring what those fired devs now can do!

By icedchai 2026-01-070:183 reply

This sounds nice, except for the fact that almost everyone else can do this, too. Or at least try to, resulting in a fast race to the bottom.

Do you really want to be a middle manager to a bunch of text boxes, churning out slop, while they drive up our power bills and slowly terraform the planet?

By cakealert 2026-01-073:382 reply

The same way that having motorized farming equipment was a race to the bottom for farmers? Perhaps. Turned out to be a good outcome for most involved.

Just like farmers who couldn't cope with the additional leverage their equipment provided them, devs who can't leverage this technology will have to "go to the cities".

By encyclopedism 2026-01-0715:11

Please do read up on how farmers are doing with this race to the bottom (it hasn't been pretty). Mega farms are a thing because small farms simply can't compete. Small farmers have gone broke. The parent comment is trying to highlight this.

If LLM's turn out the way C-Suite hopes. Let me tell you, you will be in a world of pain. Most of you won't be using LLM's to create your own businesses.

By pluralmonad 2026-01-0720:39

But modern tillage/petrol based farming is an unsustainable aberration. Maybe a good example for this discussion, but in the opposite direction if it is.

By kaydub 2026-01-0718:161 reply

LOL what an argument.

Seeing the replies here it actually doesn't seem like everyone else can do this. Looks like a lot of people really suck at using LLMs to me.

By icedchai 2026-01-0719:17

I'm not saying they can all do it now... but I don't think it's much of a stretch that they can learn it quickly and cheaply.

By benjiro 2026-01-070:52

> except for the fact that almost everyone else can do this, too. Or at least try to, resulting in a fast race to the bottom.

Ironically, that race to the bottom is no different then we already have. Have you already worked for a company before? A lot of software is developed, BADLY. I dare to say that a lot of software that Opus 4.5 generates, is often a higher quality then what i have seen in my 25 year carrier.

The amount of companies that cheapen out, hiring juniors fresh from school, to work as coding monkies is insane. Then projects have bugs / security issues, with tons of copy/pasted code, or people not knowing a darn thing.

Is that any different then your feared future? I dare to say, that LLms like Opus are frankly better then most juniors. As a junior to do a code review for security issues. Opus literally creates extensive tests, points out issues that you expect from a mid or higher level dev. Of course, you need to know to ask! You are the manager.

> Do you really want to be a middle manager to a bunch of text boxes, churning out slop, while they drive up our power bills and slowly terraform the planet?

Frankly, yes ... If you are a real developer, do you still think development is fun after 10 years, 20 years? Doing the exact same boring work. Reimplementing the 1001 login page, the 101 contact form ... A ton of our work is in reality repeating the same crap over and over again. And if we try to bypass it, we end up tied to tied to those systems / frameworks that often become a block around our necks.

Our industry has a lot of burnout because most tasks may start small but then grow beyond our scope. Todays its ruby on rails programming, then its angular, no wait, react, no wait, Vue, no wait, the new hotness is whatever again.

> slowly terraform the planet?

Well, i am actually making something.

Can you say the same for all the power / gpu draw with bitcoin, Ethereum whatever crap mining. One is productive, a tool with insane potential and usage, the other is a virtual currency where only one is ever popular with limited usage. Yet, it burns just as much for a way more limited return of usability.

Those LLMs that you are so against, make me a ton more productive. You wan to to try out something, but never really wanted to get committed because it was weeks of programming. Well, now you as manager, can get projects done fast. Learn from them way faster then your little fingers ever did.

By kaydub 2026-01-0718:15

Homey, we're going to be replacing you devs that can't stand to use LLMs lol

By 6177c40f 2026-01-0621:282 reply

You say this like it's some kind of ominous revelation, but that's just how capitalism works? Yeah, prepare for the future. All things are impermanent.

By goatlover 2026-01-0621:431 reply

I suppose as long as either humans are always able to use new tools to create new jobs, or the wealth gets shared in a fully automated society, it won't be ominous. There are other scenarios.

By 6177c40f 2026-01-073:04

I think we might make new jobs, but maybe not enough. I'll be pleasantly surprised if we get good at sharing wealth over the next few years. Maybe something like UBI will become so obviously necessary that it becomes politically feasible, I don't know. I suspect we'll probably limp along for awhile in mediocrity. Then we'll die. Same as it ever was. The important thing is to have fun with it.

By wiseowise 2026-01-0623:022 reply

> Yeah, prepare for the future.

Well excuse the shit out of my goddamn French, but being comfy for years and suddenly facing literal doom of my profession in a year wasn't on my bingo card.

And what do you even mean by "prepare"? Shit out a couple of mil out of my ass and invest asap?

By 6177c40f 2026-01-073:02

Sharpen sticks, hoard water maybe? We were always going to die someday, I don't see how this changes things.

By garblegarble 2026-01-0623:321 reply

>And what do you even mean by "prepare"?

Not the person you're responding to but... if you think it's a horse -> car change (and, to stretch the metaphor, if you think you're in the business of building stables) then preparation means train in another profession.

If you think it's a hand tools -> power tools change, learn how to use the new tools so you don't get left behind.

My opinion is it's a hand -> power tools change, and that LLMs give me the power to solve more problems for clients, and do it faster and more predictably than a client trying to achieve the same with an LLM. I hope I'm right :-)

By simonw 2026-01-0623:372 reply

That's a good analogy. I'm on team hand tools to power tools too.

By SoftTalker 2026-01-071:561 reply

Why do you suppose that these tools will conveniently stop improving at some point that increases your productivity but are still too much for your clients to use for themselves?

By simonw 2026-01-076:261 reply

Because I've seen how difficult it is to get a client to explain to me what they need their software to do.

By SoftTalker 2026-01-0718:15

And so the AI will develop the skills to interview the client and determine what they really need. There are textbooks written on how to do this, it's not going to be hard to incorporate into the training.

By th0ma5 2026-01-072:01

[flagged]

By jack_pp 2026-01-0622:151 reply

Well probably OP won't be affected because management is very pleased with him and his output, why would they fire him? Hire someone who can probably have better output than him for 10% more money or someone who might have the same output for 25% less pay?

You think any manager in their right mind would take risks like that?

I think the real consequences are that they probably are so pleased with how productive the team is becoming that they will not hire new people or fire the ones who aren't keeping up with the times.

It's like saying "wow, our factory just produced 50% more cars this year, time to shut down half the factory to reduce costs!"

By wiseowise 2026-01-0622:592 reply

> You think any manager in their right mind would take risks like that?

You really underestimate stupidity of your average manager. Two of our top performers left because they were underpaid and the manager (in charge of the comp) never even tried to retain them.

By anomaly_ 2026-01-072:27

I bet they weren't as valuable as you think. This is a common issue with certain high performing line delivery employees (particularly those with technical skills, programmers, lawyers, accountants, etc), they always think they are carrying the whole team/company on their shoulders. It almost never turns out to be the case. The machine will keep grinding.

By jack_pp 2026-01-070:25

That's one kind of stupidity. Actually firing the golden goose is one step further

By aschobel 2026-01-0618:19

i've never hit a limit with my $200 a month plan

By jdthedisciple 2026-01-077:46

I'm curious: With that much Claude Code usage, does that put your monthly Anthropic bill above $1000/mo?

By nijave 2026-01-0813:31

I still struggle with these things being _too_ good at generating code. They have a tendency to add abstractions, classes, wrappers, factories, builders to things that didn't really need all that. I find they spit out 6 files worth of code for something that really only needed 2-3 and I'm spending time going back through simplifying.

There are times those extra layers are worth it but it seems LLMs have a bias to add them prematurely and overcomplicate things. You then end up with extra complexity you didn't need.

By risyachka 2026-01-0622:00

They are sleeping on it because there is absolutely no incentive to use it.

When needed it can be picked up in a day. Otherwise they are not paid based in tickets solved etc. If the incentives were properly aligned everyone would already use it

By dominicrose 2026-01-079:38

Use Claude Code... to do what? There are multiple layers of people involved in the decision process and they only come up with a few ideas every now and then. Nothing I can't handle. AI helps but it doesn't have to be an agent.

I'm not saying there aren't use cases for agents, just that it's normal that most software engineers are sleeping on it.

By aschobel 2026-01-0618:18

Agreed and skills are a huge unlock.

codex cli even has a skill to create skills; it's super easy to get up to speed with them

https://github.com/openai/skills/blob/main/skills/.system/sk...

By chandureddyvari 2026-01-073:31

Came across official anthropic repo on gh actions very relevant to what you mentioned. Your idea on scheduled doc updation using llm is brilliant, I’m stealing this idea. https://github.com/anthropics/claude-code-action

By avereveard 2026-01-0623:50

Also new haiku. Not as smart but lighting fast, I've it review code changes impact or if i need a wide but shallow change done I've it scan the files and create a change plan. Saves a lot of time waiting for claude or codex to get their bearing.

By andrekandre 2026-01-071:391 reply

  > we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it.

(if you know) how is that compared to coderabbit? i'm seriously looking for something better rn...

By megalomanu 2026-01-0714:321 reply

Never tried coderabbit, just because this is already good enough with Claude Code. It helped us to catch dozens of important issues we wouldn't have caught. We gave some instructions in the CLAUDE.md doc in the repository - with including a nice personalized roast of the engineer that did the review in the intro and conclusion to make it fun! :) Basically, when you do a "create PR" from your Claude Code, it will help you getting your Linear ticket (or creating one if missing), ask you some important questions (like: what tests have you done?), create the PR on Github, request the reviewers, and post a "Auto Review" message with your credentials. It's not an actual review per se but this is enough for our small team.

By andrekandre 2026-01-081:551 reply

thanks for the reply, yea we have a claude.md file, but coderabbit doesn't seem to pick it up or ignore it... hmmm wish we could try out claude code.

By tinodb 2026-01-0821:18

Codex is even better in my experience at reviewing. You can find the prompt it uses in the repo

By ndesaulniers 2026-01-0718:37

Thanks for the example! There's a lot (of boilerplate?) here that I don't understand. Does anyone have good references for catching up to speed what's the purpose of all of these files in the demo?

By profstasiak 2026-01-1021:08

I use claude code all the time, but never allow it to edit my code. It proposes spaghetti code almost 80% of time

By philipwhiuk 2026-01-0711:49

I was expecting a showcase to showcase what you've done with it, not just another person's attempt at instructing an AI to follow instructions.

By moltar 2026-01-0619:18

If anyone is excited about, and has experience with this kind of stuff, please DM. I have a role open for setting up these kinds of tools and workflows.

By theanonymousone 2026-01-0618:264 reply

Is Claude "Code" anything special,or it's mostly the LLM and other CLIs (e.g. Copilot) also work?

By square_usual 2026-01-0618:40

I've tried most of the CLI coding tools with the Claude models and I keep coming back to Claude Code. It hits a sweet spot of simple and capable, and right now I'd say it's the best from an "it just works" perspective.

By kaydub 2026-01-0718:22

In my experience the CLI tool is part of the secret sauce. I haven't tried switching models per each CLI tool though. I use claude exclusively at work and for personal projects I use claude, codex, gemini.

By speedgoose 2026-01-0622:012 reply

It’s mostly the model, Copilot, Claude Code, OpenCode, snake oil like Oh My OpenCode, it’s not huge differences.

By troupo 2026-01-0622:30

Claude Code seems to package a relatively smart prompt as well, as it seems to work better even with one-line prompts than alternatives that just invoke the API.

Key word: seems. It's impossible to do a proper qualitative analysis.

By pluralmonad 2026-01-0722:49

Why do you call Oh My OpenCode snake oil?

By catlover76 2026-01-0620:12

[dead]

By gjvc 2026-01-0716:18

> (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)

take my downvote as hard as you can. this sort of thing is awfully off-putting.

By kaydub 2026-01-0717:421 reply

I'm at the point where I say fuck it, let them sleep.

The tech industry just went through an insane hiring craze and is now thinning out. This will help to separate the chaff from the wheat.

I don't know why any company would want to hire "tech" people who are terrified of tech and completely obstinate when it comes to utilizing it. All the people I see downplaying it take a half-assed approach at using it then disparage it when it's not completely perfect.

I started tinkering with LLMs in 2022. First use case, speak in natural english to the llm, give it a json structure, have it decipher the natural language and fill in that json structure (vacation planning app, so you talk to it about where/how you want to vacation and it creates the structured data in the app). Sometimes I'd use it for minor coding fixes (copy and paste a block into chatgpt, fix errors or maybe just ideation). This was all personal project stuff.

At my job we got LLM access in mid/late 2023. Not crazy useful, but still was helpful. We got claude code in 2024. These days I only have an IDE open so I can make quick changes (like bumping up a config parameter, changing a config bool, etc.). I almost write ZERO code now. I usually have 3+ claude code sessions open.

On my personal projects I'm using Gemini + codex primarily (since I have a google account and chatgpt $20/month account). When I get throttled on those I go to claude and pay per token. I'll often rip through new features, projects, ideas with one agent, then I have another agent come through and clean things up, look for code smells, etc. I don't allow the agents to have full unfettered control, but I'd say 70%+ of the time I just blindly accept their changes. If there are problems I can catch them on the MR/PR.

I agree about the low hanging fruit and I'm constantly shocked at the sheer amount of FUD around LLMs. I want to generalize, like I feel like it's just the mid/jr level devs that speak poorly about it, but there's definitely senior/staff level people I see (rarely, mind you) that also don't like LLMs.

I do feel like the online sentiment is slowly starting to change though. One thing I've noticed a lot of is that when it's an anonymous post it's more likely to downplay LLMs. But if I go on linkedin and look at actual good engineers I see them praising LLMs. Someone speaking about how powerful the LLMs are - working on sophisticated projects at startups or FAANG. Someone with FUD when it comes to LLM - web dev out of Alabama.

I could go on and on but I'm just ranting/venting a little. I guess I can end this by saying that in my professional/personal life 9/10 of the top level best engineers I know are jumping on LLMs any chance they get. Only 1/10 talks about AI slop or bullshit like that.

By throw1235435 2026-01-0720:13

Not entirely disagreeing with your point but I think they've mostly been forced to pivot recently for their own sakes; they will never say it though. As much as they may seem eager the most public people tend to also be better at outside communication and knowing what they should say in public to enjoy more opportunities, remain employed or for the top engineers to still seem relevant in the face of the communities they are a part of. Its less about money and more about respect there I think.

The "sudden switch" since Opus 4.5 when many were saying just a few months ago "I enjoy actual coding" but now are praising LLM's isn't a one off occurrence. I do think underneath it is somewhat motivated by fear; not for the job however but for relevance. i.e. its in being relevant to discussions, tech talks, new opportunities, etc.

By ps 2026-01-0714:37

OK, I am gonna be the guy and put my skin in the game here. I kind of get the hype, but the experience with e.g. Claude Code (or Github Copilot previously and others as weel) has so far been pretty unreliable.

I have Django project with 50 kLOC and it is pretty capable of understanding the architecture, style of coding, naming of variables, functions etc. Sometimes it excels on tasks like "replicate this non-trivial functionality for this other model and update the UI appropriately" and leaves me stunned. Sometimes it solves for me tedious and labourous "replace this markdown editor with something modern, allowing fullscreen edits of content" and does annoying mistake that only visual control shows and is not capable to fix it after 5 prompts. I feel as I am becoming tester more than a developer and I do not like the shift. Especially when I do not like to tell someone he did an obvious mistake and should fix it - it seems I do not care if it is human or AI, I just do not like incompetence I guess.

Yesterday I had to add some parameters to very simple Falcon project and found out it has not been updated for several months and won't build due to some pip issues with pymssql. OK, this is really marginal sub-project so I said - let's migrate it to uv and let's not get hands dirty and let the Claude do it. He did splendidly but in the Dockerfile he missed the "COPY server.py /data/" while I asked him to change the path... Build failed, I updated the path myself and moved on.

And then you listen to very smart guys like Karpathy who rave about Tab, Tab, Tab, while not understanding the language or anything about the code they write. Am I getting this wrong?

I am really far far away from letting agents touch my infrastructure via SSH, access managed databases with full access privileges etc. and dread the day one of my silly customers asks me to give their agent permission to managed services. One might say the liability should then be shifted, but at the end of the day, humans will have to deal with the damage done.

My customer who uses all the codebase I am mentioning here asked me, if there is a way to provide "some AI" with item GTINs and let it generate photos, descriptions, etc. including metadata they handcrafted and extracted for years from various sources. While it looks like nice idea and for them the possibility of decreasing the staff count, I caught the feeling they do not care about the data quality anymore or do not understand the problems the are brining upon them due to errors nobody will catch until it is too late.

TL;DR: I am using Opus 4.5, it helps a lot, I have to keep being (very) cautious. Wake up call 2026? Rather like waking up from hallucination.

By lfliosdjf 2026-01-0714:39

Why dont I see any streams building apps as quickly as they say? Just HYpe

By winterbloom 2026-01-071:411 reply

Didn't feel like reading all this so I shortened it! sorry!

I shortened it for anyone else that might need it

----

Software engineers are sleeping on Claude Code agents. By teaching it your conventions, you can automate your entire workflow:

Custom Skills: Generates code matching your UI library and API patterns.

Quality Ops: Automates ESLint, doc syncing, and E2E coverage audits.

Agentic Reviews: Performs deep PR checks against custom checklists.

Smart Triage: Pre-analyzes tickets to give devs a head start.

Check out the showcase repo to see these patterns in action.

By gjvc 2026-01-0723:19

you are part of the problem

By mcny 2026-01-0618:201 reply

Everybody says how good Claude is and I go to my code base and I can't get it to correctly update one xaml file for me. It is quicker to make changes myself than to explain exactly what I need or learn how to do "prompt engineering".

Disclaimer: I don't have access to Claude Code. My employer has only granted me Claude Teams. Supposedly, they don't use my poopy code to train their models if I use my work email Claude so I am supposed to use that. If I'm not pasting code (asking general questions) into Claude, I believe I'm allowed to use whatever.

By spaceman_2020 2026-01-0618:222 reply

What's even the point of this comment if you self-admittedly don't have access to the flagship tool that everyone has been using to make these big bold coding claims?

By hu3 2026-01-0618:36

isn't Claude Teams powerful? does it not have access to Opus?

pardon my ignorance.

I use GitHub Copilot which has access to llms like Gemini 3, Sonnet/Opus 4.5 ang GPT 5.2

By halfmatthalfcat 2026-01-0618:242 reply

Because the same claims of "AI tool does everything" are made over and over again.

By spaceman_2020 2026-01-0618:321 reply

The claims are being made for Claude Code, which you don't have access to.

By mr_mitm 2026-01-0620:53

I believe part of why Claude Code is so great because it has the chance to catch its own mistakes. It can run compilers, linters, browsers and check its own output. If it makes a mistake, it takes one or two extra iterations until it gets it right.

By fragmede 2026-01-0618:35

It's not "AI tool does everything", it's specifically Claude Code with Opus 4.5 is great at "it", for whatever "it" a given commenter is claiming.

By mcv 2026-01-0620:188 reply

Opus 4.5 ate through my Copilot quota last month, and it's already halfway through it for this month. I've used it a lot, for really complex code.

And my conclusion is: it's still not as smart as a good human programmer. It frequently got stuck, went down wrong paths, ignored what I told it to do to do something wrong, or even repeat a previous mistake I had to correct.

Yet in other ways, it's unbelievably good. I can give it a directory full of code to analyze, and it can tell me it's an implementation of Kozo Sugiyama's dagre graph layout algorithm, and immediately identify the file with the error. That's unbelievably impressive. Unfortunately it can't fix the error. The error was one of the many errors it made during previous sessions.

So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself.

Yesterday and today I was upgrading a bunch of unit tests because of a dependency upgrade, and while it was occasionally very helpful, it also regularly got stuck. I got a lot more done than usual in the same time, but I do wonder if it wasn't too much. Wasn't there an easier way to do this? I didn't look for it, because every step of the way, Opus's solution seemed obvious and easy, and I had no idea how deep a pit it was getting me into. I should have been more critical of the direction it was pointing to.

By hawtads 2026-01-0623:102 reply

Copilot and many coding agents truncates the context window and uses dynamic summarization to keep costs low for them. That's how they are able to provide flat fee plans.

You can see some of the context limits here:

https://models.dev/

If you want the full capability, use the API and use something like opencode. You will find that a single PR can easily rack up 3 digits of consumption costs.

By verdverm 2026-01-071:51

Gerring off of their plans and prompts is so worth it, I know from experience, I'm paying less and getting more so far, paying by token, heavy gemini-3-flash user, it's a really good model, this is the future (distillations into fast, good enough for 90% of tasks), not mega models like Claude. Those will still be created for distillations and the harder problems

By mcv 2026-01-070:262 reply

Maybe not, then. I'm afraid I have no idea what those numbers mean, but it looks like Gemini and ChatGPT 4 can handle a much larger context than Opus, and Opus 4.5 is cheaper than older versions. Is that correct? Because I could be misinterpreting that table.

By esperent 2026-01-071:001 reply

I don't know about GPT4 but the latest one (GPT 5.2) has 200k context window while Gemini has 1m, five times higher. You'll be wanting to stay within the first 100k on all of them to avoid hitting quotas very quickly though (either start a new task or compact when you reach that) so in practice there's no difference.

I've been cycling between a couple of $20 accounts to avoid running out of quota and the latest of all of them are great. I'd give GPT 5.2 codex the slight edge but not by a lot.

The latest Claude is about the same too but the limits on the $20 plan are too low for me to bother with.

The last week has made me realize how close these are to being commodities already. Even the CLI the agents are nearly the same bar some minor quirks (although I've hit more bugs in Gemini CLI but each time I can just save a checkpoint and restart).

The real differentiating factor right now is quota and cost.

By mcv 2026-01-0816:591 reply

> You'll be wanting to stay within the first 100k on all of them

I must admit I have no idea how to do that or what that even means. I get that bigger context window is better, but what does it mean exactly? How do you stay within that first 100k? 100k what exactly?

By hawtads 2026-01-0819:301 reply

Okay, here's the tl;dr:

Attention based neural network architectures (on which the majority of LLMs are built) has a unit economic cost that scales (roughly) n^2 i.e. quadratic (for both memory and compute). In other words, the longer the context window, the more expensive it is for the upstream provider. That's one cost.

The second cost is that you have to resend the entire context every time you send a new message. So the context is basically (where a, b, and c are messages): first context: a, second context window: a->b, third context window: a->b->c. It's a mostly stateless (there are some short term caching mechanisms, YMMV based on provider, it's why "cached" messages, especially system prompts are cheaper) process from the point of view of the developer, the state i.e. context window string is managed by the end user application (in other words, the coding agent, the IDE, the ChatGPT UI client etc.)

The per token cost is an amortized (averaged) cost of memory+compute, the actual cost is mostly quadratic with respect to each marginal token. The longer the context window the more expensive things are. Because of the above, AI agent providers (especially those that charge flat fee subscription plans) are incentivized to keep costs low by limiting the maximum context window size.

(And if you think about it carefully, your AI API costs are a quadratic cost curve projected into a linear line (flat fee per token, so the model hosting provider in some cases may make more profit if users send in shorter contexts, versus if they constantly saturate the window. YMMV of course, but it's a race to the bottom right now for LLM unit economics)

They do this by interrupting a task halfway through and generating a "summary" of the task progress, then they prompt the LLM again with a fresh prompt and the "summary" so far and the LLM will restart the task from where it left of. Of course text is a poor representation of the LLM's internal state but it's the best option so far for AI application to keep costs low.

Another thing to keep in mind is that LLMs have poorer performance the larger the input size. This is due to a variety of factors (mostly because you don't have enough training data to saturate the massive context window sizes I think).

The general graph for LLM context performance looks something like this: https://cobusgreyling.medium.com/llm-context-rot-28a6d039965... https://research.trychroma.com/context-rot

There are a bunch of tests and benchmarks (commonly referred to as "needle in a haystack") to improve the LLM performance at large context window sizes, but it's still an open area of research.

https://cloud.google.com/blog/products/ai-machine-learning/t...

The thing is, generally speaking, you will get a slightly better performance if you can squeeze all your code and problem into the context window, because the LLM can get a "whole picture" view of your codebase/problem, instead of a bunch of broken telephone summaries every dozen of thousands of tokens. Take this with a grain of salt as the field is changing rapidly so it might not be valid in a month or two.

Keep in mind that if the problem you are solving requires you to saturate the entire context window of the LLM, a single request can cost you dollars. And if you are using 1M+ context window model like gemini, you can rack up costs fairly rapidly.

By mcv 2026-01-0912:351 reply

Using Opus 4.5, I have noticed that in long sessions about a complex topic, there often comes a point when Opus starts spouting utter gibberish. One or two questions earlier it was making total sense, and suddenly it seems to have forgotten everything and responds in a way that barely relates to the question I asked, and certainly not to the "conversation" we were having.

Is that a sign of having having surpassed that context window size? I guess to keep them sharp, I should start a new session often and early.

From what I understand, a token is either a word or a character, so I can use 100k words or characters before I start running into limits. But I've got the feeling that the complexity of the problem itself also matters.

By hawtads 2026-01-0919:00

It could have exceeded either its real context window size (or the artificially truncated one) and the dynamic summarization step failed to capture the important bits of information you wanted. Alternatively, the information might be stored in certain places in the context window where it failed to perform well in needle in haystack retrieval.

This is part of the reason why people use external data stores (e.g. vector databases, graph tools like Bead etc. in the hope of supplementing the agent's native context window and task management tools).

https://github.com/steveyegge/beads

The whole field is still in its infancy. Who knows, maybe in another update or two the problem might just be solved. It's not like needle in the haystack problems aren't differentiable (mathematically speaking).

By cma 2026-01-071:18

You need to find where context breaks down, Claude was better at it even when Gemini had 5X more on paper, but both have improved with last releases.

By deanc 2026-01-077:372 reply

People are completely missing the points about agentic development. The model is obviously a huge factor in the quality of the output, but the real magic lies in how the tools are managing and injecting context in to them, as well as the tooling. I switched from Copilot to Cursor at the end of 2025, and it was absolute night and day in terms of how the agents behaved.

By port3000 2026-01-079:331 reply

Interesting you have this opinion yet you're using Cursor instead of Claude Code. By the same logic, you should get even better results directly using Anthropic's wrapper for their own model.

By deanc 2026-01-0711:38

My employer doesn't allow for Claude Code yet. I'm fully aware from speaking to other peers, that they are getting even better performance out of Claude Code.

By causal 2026-01-0718:33

In my experience GPT-5 is also much more effective in the Cursor context than the Codex context. Cursor deserves props for doing something right under the hood.

By zmmmmm 2026-01-0621:511 reply

yes just using AI for code analysis is way under appreciated I think. Even the most sceptical people on using it for coding should try it out as a tool for Q&A style code interrogation as well as generating documentation. I would say it zero-shots documentation generation better than most human efforts would to the point it begs the question of whether it's worth having the documentation in the first place. Obviously it can make mistakes but I would say they are below the threshold of human mistakes from what I've seen.

By sfink 2026-01-071:092 reply

(I haven't used AI much, so feel free to ignore me.)

This is one thing I've tried using it for, and I've found this to be very, very tricky. At first glance, it seems unbelievably good. The comments read well, they seem correct, and they even include some very non-obvious information.

But almost every time I sit down and really think about a comment that includes any of that more complex analysis, I end up discarding it. Often, it's right but it's missing the point, in a way that will lead a reader astray. It's subtle and I really ought to dig up an example, but I'm unable to find the session I'm thinking about.

This was with ChatGPT 5, fwiw. It's totally possible that other models do better. (Or even newer ChatGPT; this was very early on in 5.)

Code review is similar. It comes up with clever chains of reasoning for why something is problematic, and initially convinces me. But when I dig into it, the review comment ends up not applying.

It could also be the specific codebase I'm using this on? (It's the SpiderMonkey source.)

By zmmmmm 2026-01-075:511 reply

My main experience is with anthropic models.

I've had some encounters with inaccuracies but my general experience has been amazing. I've cloned completely foreign git repos, cranked up the tool and just said "I'm having this bug, give me an overview of how X and Y work" and it will create great high level conceptual outlines that mean I can drive straight in where without it I would spend a long time just flailing around.

I do think an essential skill is developing just the right level of scepticism. It's not really different to working with a human though. If a human tells me X or Y works in a certain way i always allow a small margin of possibility they are wrong.

By imp0cat 2026-01-077:18

But have you actually thoroughly checked the documentation it generated? My experience suggests it can often be subtly wrong.

By mcv 2026-01-0817:04

They do have a knack for missing the point. Even Opus 4.5 can laser focus on the wrong thing. It does take skill and experience to interpret them correctly and set them straight when they go wrong.

Even so, for understanding what happens in a big chunk of code, they're pretty great.

By josu 2026-01-072:515 reply

>So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself.

I don't think you've seen the full potential. I'm currently #1 on 5 different very complex computer engineering problems, and I can't even write a "hello world" in rust or cpp. You no longer need to know how to write code, you just need to understand the task at a high level and nudge the agents in the right direction. The game has changed.

- https://highload.fun/tasks/3/leaderboard

- https://highload.fun/tasks/12/leaderboard

- https://highload.fun/tasks/15/leaderboard

- https://highload.fun/tasks/18/leaderboard

- https://highload.fun/tasks/24/leaderboard

By johndough 2026-01-0822:321 reply

All the naysayer here have clearly no idea. Your large matrix multiplication implementation is quite impressive! I have set up a benchmark loop and let GPT-5.1-Codex-Max experiment for a bit (not 5.2/Opus/Gemini, because they are broken in Copilot), but it seems to be missing something crucial. With a bit of encouragement, it has implemented:

    - padding from 2000 to 2048 for easier power-of-two splitting
    - two-level Winograd matrix multiplication with tiled matmul for last level
    - unrolled AVX2 kernel for 64x64 submatrices
    - 64 byte aligned memory
    - restrict keyword for pointers
    - better compiler flags (clang -Ofast -march=native -funroll-loops -std=c++17)

But yours is still easily 25 % faster. Would you be willing to write a bit about how you set up your evaluation and which tricks Claude used to solve it?

By josu 2026-01-0914:29

Thank you. Yeah, I'm doing all those things, which do get you close to the top. The rest of things I'm doing are mostly micro-optimizations such as finding a way to avoid AVX→SSE transition penalty (1-2% improvement).

But I don't want to spoil the fun. The agents are really good at searching the web now, so posting the tricks here is basically breaking the challenge.

For example, chatGPT was able to find Matt's blog post regarding Task 1, and that's what gave me the largest jump: https://blog.mattstuchlik.com/2024/07/12/summing-integers-fa...

Interestingly, it seems that Matt's post is not on the training data of any of the major LLMs.

By zarzavat 2026-01-075:333 reply

How are you qualified to judge its performance on real code if you don't know how to write a hello world?

Yes, LLMs are very good at writing code, they are so good at writing code that they often generate reams of unmaintainable spaghetti.

When you submit to an informatics contest you don't have paying customers who depend on your code working every day. You can just throw away yesterday's code and start afresh.

Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.

By josu 2026-01-0715:29

I know what's like running a business, and building complex systems. That's not the point.

I used highload as an example because it seems like an objective rebuttal to the claim that "but it can't tackle those complex problems by itself."

And regarding this:

"Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash"

Again, a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.

By VMG 2026-01-0714:01

> Claude is very useful but it's not yet anywhere near as good as a human software developer. Like an excitable puppy it needs to be kept on a short leash.

The skill of "a human software developer" is in fact a very wide distribution, and your statement is true for a ever shrinking tail end of that

By FeepingCreature 2026-01-076:121 reply

> How are you qualified to judge its performance on real code if you don't know how to write a hello world?

The ultimate test of all software is "run it and see if it's useful for you." You do not need to be a programmer at all to be qualified to test this.

By LucaMo 2026-01-077:463 reply

What I think people get wrong (especially non-coders) is that they believe the limitation of LLMs is to build a complex algorithm. That issue in reality was fixed a long time ago. The real issue is to build a product. Think about microservices in different projects, using APIs that are not perfectly documented or whose documentation is massive, etc.

Honestly I don't know what commenters on hackernews are building, but a few months back I was hoping to use AI to build the interaction layer with Stripe to handle multiple products and delayed cancellations via subscription schedules. Everything is documented, the documentation is a bit scattered across pages, but the information is out there. At the time there was Opus 4.1, so I used that. It wrote 1000 lines of non-functional code with 0 reusability after several prompts. I then asked something to Chat gpt to see if it was possible without using schedules, it told me yes (even if there is not) and when I told Claude to recode it, it started coding random stuff that doesn't exist. I built everything to be functional and reusable myself, in approximately 300 lines of code.

The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric

By josu 2026-01-0719:15

> The above is a software engineering problem. Reimplementing a JSON parser using Opus is not fun nor useful, so that should not be used as a metric.

I've also built a bitorrent implementation from the specs in rust where I'm keeping the binary under 1MB. It supports all active and accepted BEPs: https://www.bittorrent.org/beps/bep_0000.html

Again, I literally don't know how to write a hello world in rust.

I also vibe coded a trading system that is connected to 6 trading venues. This was a fun weekend project but it ended up making +20k of pure arbitrage with just 10k of working capital. I'm not sure this proves my point, because while I don't consider myself a programmer, I did use Python, a language that I'm somewhat familiar with.

So yeah, I get what you are saying, but I don't agree. I used highload as an example, because it is an objective way of showing that a combination of LLM/agents with some guidance (from someone with no prior experience in this type of high performing architecture) was able to beat all human software developers that have taken these challenges.

By B56b 2026-01-0718:431 reply

This hits the nail on the head. There's a marked difference between a JSON parser and a real world feature in a product. Real world features are complex because they have opaque dependencies, or ones that are unknown altogether. Creating a good solution requires building a mental model of the actual complex system you're working with, which an LLM can't do. A JSON parser is effectively a book problem with no dependencies.

By josu 2026-01-0719:06

You are looking at this wrong. Creating a json parser is trivial. The thing is that my one-shot attempt was 10x slower than my final solution.

Creating a parser for this challenge that is 10x more efficient than a simple approach does require deep understanding of what you are doing. It requires optimizing the hot loop (among other things) that 90-95% of software developers wouldn't know how to do. It requires deep understanding of the AVX2 architecture.

Here you can read more about these challenges: https://blog.mattstuchlik.com/2024/07/12/summing-integers-fa...

By FeepingCreature 2026-01-086:21

You need to give it search and tool calls and the ability to test its own code and iterate. I too could not oneshot an interaction layer with Stripe without tools. It also helps to make it research a plan beforehand.

By throw1235435 2026-01-0720:21

If that is true; then all the commentary around software people having jobs still due to "taste" and other nice words is just that. Commentary. In the end the higher level stuff still needs someone to learn it (e.g. learning ASX2 architecture, knowing what tech to work with); but it requires IMO significantly less practice then coding which in itself was a gate. The skill morphs more into a tech expert rather than a coding expert.

I'm not sure what this means for the future of SWE's though yet. I don't see higher levels of staff in big large businesses bothering to do this, and at some scale I don't see founders still wanting to manage all of these agents, and processes (got better things to do at higher levels). But I do see the barrier of learning to code gone; meaning it probably becomes just like any other job.

By ModernMech 2026-01-082:321 reply

None of the problems you've shown there are anything close to "very complex computer engineering problems", they're more like "toy problems with widely-known solutions given to students to help them practice for when they encounter actually complex problems".

By josu 2026-01-1322:121 reply

I think you misunderstood, it's not about solving the problem, is about finding the most efficient solution. Give it a shot, and see if you can get to the top 10 on any task.

By ModernMech 2026-01-142:13

The point is these problems are well understood and solved, solving them well with AI doesn’t mean anything as you’re just reciting something that’s already been done already.

By dajoh 2026-01-079:431 reply

>I'm currently #1 on 5 different very complex computer engineering problems

Ah yes, well known very complex computer engineering problems such as:

* Parsing JSON objects, summing a single field

* Matrix multiplication

* Parsing and evaluating integer basic arithmetic expressions

And you're telling me all you needed to do to get the best solution in the world to these problems was talk to an LLM?

By josu 2026-01-0714:58

Lol, the problem is not finding a solution, the problem is solving it in the most efficient way.

If you think you can beat an LLM, the leaderboard is right there.

By yieldcrv 2026-01-0622:341 reply

It acts differently when using it through a third party tool

Try it again using Claude Code and a subscription to Claude. It can run as a chat window in VS Code and Cursor too.

By mcv 2026-01-0622:451 reply

My employer gets me a Copilot subscription with access to Claude, not a subscription to Claude Code, unfortunately.

By yieldcrv 2026-01-070:301 reply

at this point I would suggest getting a $20 subscription to start, seeing if you can expense it

the tooling is almost as important as the model

By mcv 2026-01-0817:101 reply

Security and approval is considered more important here. Just getting approval for neo4j on the clearest ever use case for it, took a year. I'm not going to spend my energy on getting approval for Claude Code.

By yieldcrv 2026-01-0920:47

Get it for yourself on your personal computer

Point it at your unfinished side projects if any and describe what the project was supposed to do

You need to be able to perceive how far behind you’re falling while simping for corporate policies

By NSPG911 2026-01-071:021 reply

> Opus 4.5 ate theough my Copilot quota last month

Sure, Copilot charges 3x tokens for using Opus 4.5, but, how were you still able to use up half the allocated tokens not even one week into January?

I thought using up 50% was mad for me (inline completions + opencode), that's even worse

By mcv 2026-01-0817:07

I have no idea. Careless use, I guess. I was fixing a bunch of mocks in some once-great but now poorly maintained code, and I wasn't really feeling it so I just fed everything to Claude. Opus, unfortunately. I could easily have downgraded a bit.

By Davidzheng 2026-01-078:01

If it can consistently verify that the error persists after fix--you can run (ok maybe you can't budget wise but theoretically) 10000 parallel instances of fixer agents then verify afterwards (this is in line with how the imo/ioi models work according to rumors)

By multisport 2026-01-070:3338 reply

What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects. If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them. The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible.

No doubt I could give Opus 4.5 "build be a XYZ app" and it will do well. But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". Any non-technical person might read that and go "if it works it works" but any reasonable engineer will know that thats not enough.

By redhale 2026-01-0712:3512 reply

Not necessarily responding to you directly, but I find this take to be interesting, and I see it every time an article like this makes the rounds.

Starting back in 2022/2023:

- (~2022) It can auto-complete one line, but it can't write a full function.

- (~2023) Ok, it can write a full function, but it can't write a full feature.

- (~2024) Ok, it can write a full feature, but it can't write a simple application.

- (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product.

- (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term.

It's pretty clear to me where this is going. The only question is how long it takes to get there.

By arkensaw 2026-01-0713:151 reply

> It's pretty clear to me where this is going. The only question is how long it takes to get there.

I don't think its a guarantee. all of the things it can do from that list are greenfield, they just have increasing complexity. The problem comes because even in agentic mode, these models do not (and I would argue, can not) understand code or how it works, they just see patterns and generate a plausible sounding explanation or solution. agentic mode means they can try/fail/try/fail/try/fail until something works, but without understanding the code, especially of a large, complex, long-lived codebase, they can unwittingly break something without realising - just like an intern or newbie on the project, which is the most common analogy for LLMs, with good reason.

By namrog84 2026-01-0714:393 reply

While I do agree with you. To play the counterpoint advocate though.

What if we get to the point where all software is basically created 'on the fly' as greenfield projects as needed? And you never need to have complex large long lived codebase?

It is probably incredibly wasteful, but ignoring that, could it work?

By fwip 2026-01-0719:521 reply

That sounds like an insane way to do anything that matters.

Sure, create a one-off app to post things to your Facebook page. But a one-off app for the OS it's running on? Freshly generating the code for your bank transaction rules? Generating an authorization service that gates access to your email?

The only reason it's quick to create green-field projects is because of all these complex, large, long-lived codebases that it's gluing together. There's ample training data out there for how to use the Firebase API, the Facebook API, OS calls, etc. Without those long-lived abstraction layers, you can't vibe out anything that matters.

By theshrike79 2026-01-0721:483 reply

In Japan buildings (apartments) aren't built to last forever. They are built with a specific age in mind. They acknowledge the fact that houses are depreciating assets which have a value lim->0.

The only reason we don't do that with code (or didn't use to do it) was because rewriting from scratch NEVER worked[0]. And large scale refactors take massive amounts of time and resources, so much so that there are whole books written about how to do it.

But today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec".

[0] https://www.joelonsoftware.com/2000/04/06/things-you-should-...

[1] https://simonwillison.net/2025/Dec/15/porting-justhtml/

By techblueberry 2026-01-0816:00

> But today trivial to simple applications can be rewritten from spec or scratch in an afternoon with an LLM. And even pretty complex parsers can be ported provided that the tests are robust enough[1]. It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec".

This seems like a sort of I dunno chicken and the egg thing.

The _reason_ you don't rewrite code is because it's hard to know that you truly understand the spec. If you could perfectly understand the spec then you could rewrite the code, but then what is the software, is it the code or the spec that writes the code. So if you built code A from spec, rebuilding it from spec I don't think qualifies a rewrite, it's just a recompile. If you're trying to fundamentally build a new application from spec when an old application was written by hand, you're going to run into the same problems you have in a normal rewrite.

We already have an example of this. Typescript applications are basically rewritten every time that you recompile typescript to node. Typescript isn't the executed code, it's a spec.

edit: I think I missed that you said rewrite in a different language, then yeah fine, you're probably right, but I don't think most people are architecture agnostic when they talk about rewrites. The point of a rewrite is to keep the good stuff and lose a lot of bad stuff. If you're using the original app as a spec to rewrite in a new language, then fine yeah, LLM's may be able to do this relatively trivially.

By arkensaw 2026-01-1516:03

I don't know about Japan - I vaguely recall reading that most buildings over there are built with wood (even the big ones) and that this is historically something to do with rebuilding after Tsunamis and earthquakes.

Buildings in most other countries in the world ARE built to last forever, and often renovated, changed, extended and modified long after the incept date until, because needs change, and destroying them to start over is complete overkill (Although some people do these "large scale refactors" - they're usually rich).

> It's just a metter of time someone rewrites a small to medium size application from one language to another using the previous app as the "spec".

I have no doubt of this. I'm sure it's happening already. But the whole point of long term stable applications is that they are tried and tested. A port done in an afternoon by an LLM might be great, but you can't know if it has problems until it has withstood the test of time.

By fwip 2026-01-080:48

Sure, and the buildings are built to a slowly-evolving code, using standard construction techniques, operating as a predictable building in a larger ecosystem.

The problem with "all software" being AI-generated is that, to use your analogy, the electrical standards, foundation, and building materials have all been recently vibe-coded into existence, and none of your construction workers are certified in any of it.

By techblueberry 2026-01-0815:50

I don't think so. I don't think this is how human brains work, and you would have too many problems trying to balance things out. I'm thinking specifically like a complex distributed system. There are a lot of tweaks and iterations you need for things to work with eachother.

But then maybe this means what is a "codebase". If a code base is just a structured set of specs that compile to code ala typescript -> javascript. sure, but then, it's still a long-lived <blank>

But maybe you would have to elaborate on, what does "creating software on the fly" look like,. because I'm sure there's a definition where the answer is yes.

By damethos 2026-01-0719:24

I have the same questions in my head lately.

By bayindirh 2026-01-0712:445 reply

Well, the first 90% is easy, the hard part is the second 90%.

Case in point: Self driving cars.

Also, consider that we need to pirate the whole internet to be able to do this, so these models are not creative. They are just directed blenders.

By throwthrowuknow 2026-01-0713:142 reply

Even if Opus 4.5 is the limit it’s still a massively useful tool. I don’t believe it’s the limit though for the simple fact that a lot could be done by creating more specialized models for each subdomain i.e. they’ve focused mostly on web based development but could do the same for any other paradigm.

By emodendroket 2026-01-0714:23

That's a massive shift in the claim though... I don't think anyone is disputing that it's a useful tool; just the implication that because it's a useful tool and has seen rapid improvement that implies they're going to "get all the way there," so to speak.

By bayindirh 2026-01-0713:181 reply

Personally I'm not against LLMs or AI itself, but considering how these models are built and trained, I personally refuse to use tools built on others' work without or against their consent (esp. GPL/LGPL/AGPL, Non Commercial / No Derivatives CC licenses and Source Available licenses).

Of course the tech will be useful and ethical if these problems are solved or decided to be solved the right way.

By ForHackernews 2026-01-0713:252 reply

We just need to tax the hell out of the AI companies (assuming they are ever profitable) since all their gains are built on plundering the collective wisdom of humanity.

By thfuran 2026-01-0713:38

I don’t think waiting for profitability makes sense. They can be massively disruptive without much profit as long as they spend enough money.

By encyclopedism 2026-01-0715:15

AI companies and corporations in general control your politicians so taxing isn't going to happen.

By literalAardvark 2026-01-0713:262 reply

They're not blenders.

This is clear from the fact that you can distill the logic ability from a 700b parameter model into a 14b model and maintain almost all of it.

You just lose knowledge, which can be provided externally, and which is the actual "pirated" part.

The logic is _learned_

By encyclopedism 2026-01-0715:171 reply

It hasn't learned any LOGIC. It has 'learned' patterns from the input.

By theshrike79 2026-01-0721:491 reply

What is logic other than applying patterns?

By encyclopedism 2026-01-0722:481 reply

The definition is broad for now this will do: Logic is the study of correct reasoning.

By vidarh 2026-01-0814:27

How is that different from applying patterns?

By bayindirh 2026-01-0713:311 reply

Are there any recent publications about it so I can refresh myself on the matter?

By D-Machine 2026-01-0715:21

You won't find any trustworthy papers on the topic because GP is simply wrong here.

That models can be distilled has no bearing whatsoever on whether a model has learned actual knowledge or understanding ("logic"). Models have always learned sparse/approximately-sparse and/or redundant weights, but they are still all doing manifold-fitting.

The resulting embeddings from such fitting reflect semantics and semantic patterns. For LLMs trained on the internet, the semantic patterns learned are linguistic, which are not just strictly logical, but also reflect emotional, connotational, conventional, and frequent patterns, all of which can be illogical or just wrong. While linguistic semantic patterns are correlated with logical patterns in some cases, this is simply not true in general.

By mcfedr 2026-01-0714:22

i like to think of LLMs as random number generators with a filter

By rat9988 2026-01-0712:533 reply

> Well, the first 90% is easy, the hard part is the second 90%.

You'd need to prove that this assertion applies here. I understand that you can't deduce the future gains rate from the past, but you also can't state this as universal truth.

By bayindirh 2026-01-0713:304 reply

No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs. The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline.

Knowledge engineering has a notion called "covered/invisible knowledge" which points to the small things we do unknowingly but changes the whole outcome. None of the models (even AI in general) can capture this. We can say it's the essence of being human or the tribal knowledge which makes experienced worker who they are or makes mom's rice taste that good.

Considering these are highly individualized and unique behaviors, a model based on averaging everything can't capture this essence easily if it can ever without extensive fine-tuning for/with that particular person.

By enraged_camel 2026-01-0714:081 reply

>> No, I don't need to. Self driving cars is the most recent and biggest example sans LLMs.

Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid.

>> The saying I have quoted (which has different forms) is valid for programming, construction and even cooking. So it's a simple, well understood baseline.

Sure, but the question is not "how long does it take for LLMs to get to 100%". The question is, how long does it take for them to become as good as, or better than, humans. And that threshold happens way before 100%.

By bayindirh 2026-01-0714:40

>> Self-driving cars don't use LLMs, so I don't know how any rational analysis can claim that the analogy is valid.

Doesn't matter, because if we're talking about AI models, no (type of) model reaches 100% linearly, or 100% ever. For example, recognition models run with probabilities. Like Tesla's Autopilot (TM), which loves to hit rolled-over vehicles because it has not seen enough vehicle underbodies to classify it.

Same for scientific classification models. They emit probabilities, not certain results.

>> Sure, but the question is not "how long does it take for LLMs to get to 100%"

I never claimed that a model needs to reach a proverbial 100%.

>> The question is, how long does it take for them to become as good as, or better than, humans.

They can be better than humans for certain tasks. They are actually better than humans in some tasks since 70s, but we like to disregard them to romanticize current improvements, but I don't believe current or any generation of AIs can be better than humans in anything and everything, at once.

Remember: No machine can construct something more complex than itself.

>> And that threshold happens way before 100%.

Yes, and I consider that "treshold" as "complete", if they can ever reach it for certain tasks, not "any" task.

By rat9988 2026-01-0717:59

Self driving cars is not a proof. It only proves that having quick gains doesn't mean necessarily you'll get a 100% fast. It doesn't prove it will necessarily happen.

By damethos 2026-01-0719:341 reply

"covered/invisible knowledge" aka tacit knowledge

By bayindirh 2026-01-0719:40

Yeah, I failed to remember the term while writing the comment. Thanks!

By thfuran 2026-01-0713:412 reply

>None of the models (even AI in general) can capture this

None of the current models maybe, but not AI in general? There’s nothing magical about brains. In fact, they’re pretty shit in many ways.

By bayindirh 2026-01-0713:47

A model trained on a very large corpus can't, because these behaviors are different or specialized enough they cancel each other most of the cases. You can forcefully fine-tune a model with a singular person's behavior up to a certain point, but I'm not sure that even that can capture the subtlest of behaviors or decision mechanisms which are generally the most important ones (the ones we call gut feeling or instinct).

OTOH, while I won't call human brain perfect, the things we label "shit" generally turn out to be very clever and useful optimizations to workaround its own limitations, so I regard human brain higher than most AI proponents do. Also we shouldn't forget that we don't know much about how that thing works. We only guess and try to model it.

Lastly, searching perfection in numbers and charts or in engineering sense is misunderstanding nature and doing a great disservice to it, but this is a subject for another day.

By emodendroket 2026-01-0714:271 reply

The understanding of the brain is far from complete whether they're "magical" or "shit."

By D-Machine 2026-01-0715:32

Also obviously brains are both!

By sanderjd 2026-01-0713:07

I read the comment more as "based on past experience, it is usually the case that the first 90% is easier than the last 10%", which is the right base case expectation, I think. That doesn't mean it will definitely play out that way, but you don't have to "prove" things like this. You can just say that they tend to be true, so it's a good expectation to think it will probably be true again.

By rybosworld 2026-01-0714:04

The saying is more or less treated as a truism at this point. OP isn't claiming something original and the onus of proving it isn't on them imo.

I've heard this same thing repeated dozens of times, and for different domains/industries.

It's really just a variation of the 80/20 rule.

By PunchyHamster 2026-01-0713:362 reply

Note that blog posts rarely show the 20 other times it failed to build something and only that time that it happened to work.

We've been having same progression with self driving cars and they are also stuck on the last 10% for last 5 years

By redhale 2026-01-0721:05

I agree with your observation, but not your conclusion. The 20 times it failed basically don't matter -- they are branches that can just be thrown away, and all that was lost is a few dollars on tokens (ignoring the environmental impact, which is a different conversation).

As long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better.

And I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around.

By theshrike79 2026-01-0721:57

With "art" we're now at a situation where I can get 50 variations of a image prompt within seconds from an LLM.

Does it matter that 49 of them "failed"? It cost me fractions of a cent, so not really.

If every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway.

It's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it.

By sanderjd 2026-01-0713:051 reply

Yeah maybe, but personally it feels more like a plateau to me than an exponential takeoff, at the moment.

And this isn't a pessimistic take! I love this period of time where the models themselves are unbelievably useful, and people are also focusing on the user experience of using those amazing models to do useful things. It's an exciting time!

But I'm still pretty skeptical of "these things are about to not require human operators in the loop at all!".

By throwthrowuknow 2026-01-0713:091 reply

I can agree that it doesn’t seem exponential yet but this is at least linear progression not a plateau.

By sanderjd 2026-01-0715:14

Linear progression feels slower (and thus more like a plateau) to me than the end of 2022 through end of 2024 period.

The question in my mind is where we are on the s-curve. Are we just now entering hyper-growth? Or are we starting to level out toward maturity?

It seems like it must still be hyper-growth, but it feels less that way to me than it did a year ago. I think in large part my sense is that there are two curves happening simultaneously, but at different rates. There is the growth in capabilities, and then there is the growth in adoption. I think it's the first curve that seems to be to have slown a bit. Model improvements seem both amazing and also less revolutionary to me than they did a year or two ago.

But the other curve is adoption, and I think that one is way further from maturity. The providers are focusing more on the tooling now that the models are good enough. I'm seeing "normies" (that is, non-programmers) starting to realize the power of Claude Code in their own workflows. I think that's gonna be huge and is just getting started.

By Scea91 2026-01-0714:02

> - (~2023) Ok, it can write a full function, but it can't write a full feature.

The trend is definitely here, but even today, heavily depends on the feature.

While extra useful, it requires intense iteration and human insight for > 90% of our backlog. We develop a cybersecurity product.

By EthanHeilman 2026-01-0715:38

I haven't seen an AI successfully write a full feature to an existing codebase without substantial help, I don't think we are there yet.

> The only question is how long it takes to get there.

This is the question and I would temper expectations with the fact that we are likely to hit diminishing returns from real gains in intelligence as task difficulty increases. Real world tasks probably fit into a complexity hierarchy similar to computational complexity. One of the reasons that the AI predictions made in the 1950s for the 1960s did not come to be was because we assumed problem difficulty scaled linearly. Double the computing speed, get twice as good at chess or get twice as good at planning an economy. P, NP separation planed these predictions. It is likely that current predictions will run into similar separations.

It is probably the case that if you made a human 10x as smart they would only be 1.25x more productive at software engineering. The reason we have 10x engineers is less about raw intelligence, they are not 10x more intelligent, rather they have more knowledge and wisdom.

By kubb 2026-01-0713:45

Each of these years we’ve had a claim that it’s about to replace all engineers.

By your logic, does it mean that engineers will never get replaced?

By fernandezpablo 2026-01-0921:45

Starting back in 2022/2023:

- (~2022) "It's so over for developers". 2022 ends with more professional developers than 2021.

- (~2023) "Ok, now it's really over for developers". 2023 ends with more professional developers than 2022.

- (~2024) "Ok, now it's really, really over for developers". 2024 ends with more professional developers than 2023.

- (~2025) "Ok, now it's really, really, absolutely over for developers". 2025 ends with more professional developers than 2024.

- (~2025+) etc.

Sources: https://www.jetbrains.com/lp/devecosystem-data-playground/#g...

By HarHarVeryFunny 2026-01-0714:16

Sure, eventually we'll have AGI, then no worries, but in the meantime you can only use the tools that exist today, and dreaming about what should be available in the future doesn't help.

I suspect that the timeline from autocomplete-one-line to autocomplete-one-app, which was basically a matter of scaling and RL, may in retrospect turn out to have been a lot faster that the next LLM to AGI step where it becomes capable of using human level judgement and reasoning, etc, to become a developer, not just a coding tool.

By itsthecourier 2026-01-0712:44

I use it on a 10 years codebase, needs to explain where to get context but successfully works 90% of time

By mjr00 2026-01-0714:001 reply

This is disingenuous because LLMs were already writing full, simple applications in 2023.[0]

They're definitely better now, but it's not like ChatGPT 3.5 couldn't write a full simple todo list app in 2023. There were a billion blog posts talking about that and how it meant the death of the software industry.

Plus I'd actually argue more of the improvements have come from tooling around the models rather than what's in the models themselves.

[0] eg https://www.youtube.com/watch?v=GizsSo-EevA

By blitz_skull 2026-01-0714:021 reply

What LLM were you using to build full applications in 2023? That certainly wasn’t my experience.

By mjr00 2026-01-0714:051 reply

Just from googling, here's a video "Use ChatGPT to Code a Full Stack App" from May 18, 2023.[0]

There's a lot of non-ergonomic copy and pasting but it's definitely using an LLM to build a full application.

[0] https://www.youtube.com/watch?v=GizsSo-EevA

By blitz_skull 2026-01-0714:371 reply

That's not at all what's being discussed in this article. We copy-pasted from SO before this. This article is talking about 99% fully autonomous coding with agents, not copy-pasting 400 times from a chat bot.

By mjr00 2026-01-0715:001 reply

Hi, please re-read the parent comment again, which was claiming

> Starting back in 2022/2023:

> - (~2022) It can auto-complete one line, but it can't write a full function.

> - (~2023) Ok, it can write a full function, but it can't write a full feature.

This was a direct refutation, with evidence, that in 2023 people were not claiming that LLMs "can't write a full feature", because, as demonstrated, people were already building full applications with it at the time.

This obviously is not talking exclusively about agents, because agents did not exist in 2022.

By redhale 2026-01-080:24

I get your point, but I'll just say that I did not intend my comment to be interpreted so literally.

Also, just because SOMEONE planted a flag in 2023 saying that an LLM could build an app certainly does NOT mean that "people were not claiming that LLMs "can't write a full feature"". People in this very thread are still claiming LLMs can't write features. Opinions vary.

By ugurs 2026-01-0715:37

Ok, it can create a long-lived complex codebase for a product that is extensible and scalable over the long term, but it doesn't have cool tattoos and can't fancy a matcha

By FloorEgg 2026-01-070:504 reply

There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things.

I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case.

I remember cases where a team of engineers built something the "right" way but it turned out to be the wrong thing. (Well engineered thing no one ever used)

Sometimes hacking something together messily to confirm it's the right thing to be building is the right way. Then making sure it's secure, then finally paying down some technical debt to make it more maintainable and extensible.

Where I see real silly problems is when engineers over-engineer from the start before it's clear they are building the right thing, or when management never lets them clean up the code base to make it maintainable or extensible when it's clear it is the right thing.

There's always a balance/tension, but it's when things go too far one way or another that I see avoidable failures.

By ozim 2026-01-079:541 reply

*I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case.*

Gosh I am so tired with that one - someone had a case that burned them in some previous project and now his life mission is to prevent that from happening ever again, and there would be no argument they will take.

Then you get like up to 10 engineers on typical team and team rotation and you end up with all kinds of "we have to do it right because we had to pull all nighter once, 5 years ago" baked in the system.

Not fun part is a lot of business/management people "expect" having perfect solution right away - there are some reasonable ones that understand you need some iteration.

By mrheosuper 2026-01-0712:262 reply

>someone had a case that burned them in some previous project and now his life mission is to prevent that from happening ever again

Isn't that what makes them senior ? If you dont want that behaviour, just hire a bunch of fresh grad.

By lukan 2026-01-0712:381 reply

No, extrapolating from one bad experience to universal approach does not make anyone senior.

There are situations where it applies and situation where it doesn't. Having the experience to see what applies in this new context is what senior (usually) means.

By sanderjd 2026-01-0713:121 reply

The people I admire most talk a lot more about "risk" than about "right vs. wrong". You can do that thing that caused that all-nighter 5 years ago, it isn't "wrong", but it is risky, and the person who pulled that all-nighter has useful information about that risk. It often makes sense to accept risks, but it's always good to be aware that you're doing so.

By yurishimo 2026-01-0714:501 reply

It's also important to consider the developers risk tolerance as well. It's all fine and dandy that the project manager is okay with the risk but what if none of the developers are? Or one senior dev is okay with it but the 3 who actually work the on-call queue are not?

I don't get paid extra for after hours incidents (usually we just trade time), so it's well within my purview on when to take on extra risk. Obviously, this is not ideal, but I don't make the on-call rules and my ability to change them is not a factor.

By sanderjd 2026-01-0715:03

I don't think of this as a project manager's role, but an engineering manager's role. The engineers on the team (especially the senior engineers) should be identifying the risks, and the engineering managers should be deciding whether they are tolerable. That includes risks like "the oncall is awful and morale collapses and everyone quits".

It's certainly the case that there are managers who handle those risks poorly, but that's just bad management.

By ozim 2026-01-0714:12

Nope, not realizing something doesn't apply and not being able to take in arguments is cargo culting not being a senior.

By yourapostasy 2026-01-071:561 reply

> ...multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered.

I usually resolve this by putting on the table the consequences and their impacts upon my team that I’m concerned about, and my proposed mitigation for those impacts. The mitigation always involves the other proposer’s team picking up the impact remediation. In writing. In the SOP’s. Calling out the design decision by day of the decision to jog memories and names of those present that wanted the design as the SME’s. Registered with the operations center. With automated monitoring and notification code we’re happy to offer.

Once people are asked to put accountable skin in the sustaining operations, we find out real fast who is taking into consideration the full spectrum end to end consequences of their decisions. And we find out the real tradeoffs people are making, and the externalities they’re hoping to unload or maybe don’t even perceive.

By gleenn 2026-01-076:481 reply

That's awesome, but I feel like half the time most people aren't in the position to add requirements so a lot of shenanigans still happens, especially in big corps

By yourapostasy 2026-01-0915:14

I am satisfied when someone tells us we cannot change requirements, to get their acknowledgement that what we bring up does extract a specific trade-off, and their reason for accepting the trade-off, then recording it into design and operational documentation. The moment many people recognize this trade-off will be explicitly documented with their and their team's accountability in detail, is when you surface genuine trade-offs made with the debt to pay off in the future in mind and in the meantime a rationale to grant a ton of leeway to the team burdened with the externality going forward, and trade-offs made without understanding their externalities upon other teams (which happens a tremendous amount in large organizations).

Most of the time, people are just very reasonably and understandably focusing tightly on their lane and honestly had no idea of the externalities of their conclusions and decisions, and I'm happy to have experienced all those times a rebalancing of the trade-offs that everyone can accept and is grateful to have documented to justify spending the story points upon cleaning up later instead of working on new features while the externality debt's unwanted impact keeps piling up.

In fewer than a handful of times, I run into people deliberately, consciously with malice aforethought of the full externalities making trade-offs for the sake of expediently shifting burdens of of them without first consulting with partner teams they want to shift the burdens onto, simply so they can fatten their promo packet sooner at the expense of making other teams look worse. Getting these trade-offs documented about half the time makes them back down to a more reasonable trade-off, about half the time they don't back down but your team is now protected by explicit documentation and caveats upon the externality your team now has to carry, and 100% of the time my team and I put a ring fence upon all future interactions with that personality for at least the remaining duration of my gig.

By kalaksi 2026-01-079:33

> I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case.

My first thought was that you probably also have different biases, priorities and/or taste. As always, this is probably very context-specific and requires judgement to know when something goes too far. It's difficult to know the "most correct" approach beforehand.

> Sometimes hacking something together messily to confirm it's the right thing to be building is the right way. Then making sure it's secure, then finally paying down some technical debt to make it more maintainable and extensible.

I agree that sometimes it is, but in other cases my experience has been that when something is done, works and is used by customers, it's very hard to argue about refactoring it. Management doesn't want to waste hours on it (who pays for it?) and doesn't want to risk breaking stuff (or changing APIs) when it works. It's all reasonable.

And when some time passes, the related intricacies, bigger picture and initially floated ideas fade from memory. Now other stuff may depend on the existing implementation. People get used to the way things are done. It gets harder and harder to refactor things.

Again, this probably depends a lot on a project and what kind of software we're talking about.

> There's always a balance/tension, but it's when things go too far one way or another that I see avoidable failures.

I think balance/tension describes it well and good results probably require input from different people and from different angles.

By Ericson2314 2026-01-073:533 reply

I know what you are talking about, but there is more to life than just product-market fit.

Hardly any of us are working on Postgres, Photoshop, blender, etc. but it's not just cope to wish we were.

It's good to think about the needs to business and the needs of society separately. Yes, the thing needs users, or no one is benefiting. But it also needs to do good for those users, and ultimately, at the highest caliber, craftsmanship starts to matter again.

There are legitimate reasons for the startup ecosystem to focus firstly and primarily on getting the users/customers. I'm not arguing against that. What I am arguing is why does the industry need to be dominated by startups in terms of the bulk of the products (not bulk of the users). It begs the question of how much societally-meaningful programming waiting to be done.

I'm hoping for a world where more end users code (vibe or otherwise) and the solve their own problems with their own software. I think that will make more a smaller, more elite software industry that is more focused on infrastructure than last-mile value capture. The question is how to fund the infrastructure. I don't know except for the most elite projects, which is not good enough for the industry (even this hypothetical smaller one) on the whole.

By sanderjd 2026-01-0713:17

> I'm hoping for a world where more end users code (vibe or otherwise) and the solve their own problems with their own software. I think that will make more a smaller, more elite software industry that is more focused on infrastructure than last-mile value capture.

Yes! This is what I'm excited about as well. Though I'm genuinely ambivalent about what I want my role to be. Sometimes I'm excited about figuring out how I can work on the infrastructure side. That would be more similar to what I've done in my career thus far. But a lot of the time, I think that what I'd prefer would be to become one of those end users with my own domain-specific problems in some niche that I'm building my own software to help myself with. That sounds pretty great! But it might be a pretty unnatural or even painful change for a lot of us who have been focused for so long on building software tools for other people to use.

By swat535 2026-01-0717:211 reply

Users will not care about the quality of your code, or the backed architecture, or your perfectly strongly typed language.

They only care about their problems and treat their computers like an appliance. They don't care if it takes 10 seconds or 20 seconds.

They don't even care if it has ads, popups, and junk. They are used to bloatware and will gladly open their wallets if the tool is helping them get by.

It's an unfortunately reality but there it is, software is about money and solving problems. Unless you are working on a mission critical system that affects people's health or financial data, none of those matter much.

By Ericson2314 2026-01-0717:53

I know the customer's couldn't care about the quality of the code they see. But the idea that they don't care about software being bad/laggy/bloated ever, because it "still solves problems", doesn't stand up to scrutiny as an immutable fact of the universe. Market conditions can change.

I'm banking on a future that if users feel they can (perhaps vibe) code their own solutions, they are far less likely to open their wallets for our bloatware solutions. Why pay exorbitant rents for shitty SaaS if you can make your own thing ad-free, exactly to your own mental spec?

I want the "computers are new, programmers are in short supply, customer is desperate" era we've had in my lifetime so far to come to a close.

By saxenaabhi 2026-01-074:171 reply

> There are legitimate reasons for the startup ecosystem to focus firstly and primarily on getting the users/customers. I'm not arguing against that. What I am arguing is why does the industry need to be dominated by startups in terms of the bulk of the products (not bulk of the users). It begs the question of how much societally-meaningful programming waiting to be done.

You slipped in "societally-meaningful" and I don't know what it means and don't want to debate merits/demerits of socialism/capitalism.

However I think lots of software needs to be written because in my estimation with AI/LLM/ML it'll generate value.

And then you have lots of software that needs to rewritten as firms/technologies die and new firms/technologies are born.

By Ericson2314 2026-01-075:141 reply

I didn't mean to do some snide anticaptialism. Making new Postgreses and blenders is really hard. I don't think the startup ecosystem does a very good job, but I don't assume central planning would do a much better job either.

(The method I have the most confidence in is some sort of mixed system where there is non-profit, state-planned, and startup software development all at once.)

Markets are a tool, a means to the end. I think they're very good, I'm a big fan! But they are not an excuse not to think about the outcome we want.

I'm confident that the outcome I don't want is where most software developers are trying to find demand for their work, pivoting etc. it's very "pushing a string" or "cart before the horse". I want more "pull" where the users/benefiaries of software are better able to dictate or create themselves what they want, rather than being helpless until a pivoting engineer finds it for them.

Basically start-up culture has combined theories of exogenous growth from technology change, and a baseline assumption that most people are and will remain hopelessly computer illiterate, into an ideology that assumes the best software is always "surprising", a paradigm shift, etc.

Startups that make libraries/tools for other software developers are fortunately a good step in undermining these "the customer is an idiot and the product will be better than they expect" assumptions. That gives me hope we're reach a healthier mix of push and pull. Wild successes are always disruptive, but that shouldn't mean that the only success is wild, or trying to "act disruptive before wild success" ("manifest" paradigm shifts!) is always the best means to get there.

By bigfudge 2026-01-0713:14

I've worked in various roles, and I'm one of those people who is not computer illiterate and likes to build solutions that meet local needs.

It's got a lot easier technically to do that in recent year, and MUCH easier with AI.

But institutionally and in terms of governance it's got a lot harder. Nobody wants home-brew software anymore. Doing data management and governance is complex enough and involves enough different people that it's really hard to generate the momentum to get projects off the ground.

I still think it's often the right solution and that successful orgs will go this route and retain people with the skills to make it happen. But the majority probably can't afford the time/complexity, and AI is only part of the balance that determines whether it's feasible.

By fenwick67 2026-01-070:583 reply

Another thing that gets me with projects like this, there are already many examples of image converters, minesweeper clones etc that you can just fork on GitHub, the value of the LLM here is largely just stripping the copyright off

By sksishbs 2026-01-071:194 reply

It’s kind of funny - there’s another thread up where a dev claimed a 20-50x speed up. To their credit they posted videos and links to the repo of their work.

And when you check the work, a large portion of it was hand rolling an ORM (via an LLM). Relatively solved problem that an LLM would excel at, but also not meaningfully moving the needle when you could use an existing library. And likely just creating more debt down the road.

By yourapostasy 2026-01-072:062 reply

Reminds me of a post I read a few days ago of someone crowing about an LLM writing for them an email format validator. They did not have the LLM code up an accompanying send-an-email-validation loop, and were blithely kept uninformed by the LLM of the scar tissue built up by experience in the industry on how curiously a deep rabbit hole email validation becomes.

If you’ve been around the block and are judicious how you use them, LLM’s are a really amazing productivity boost. For those without that judgement and taste, I’m seeing footguns proliferate and the LLM’s are not warning them when someone steps on the pressure plate that’s about to blow off their foot. I’m hopeful we will this year create better context window-based or recursive guardrails for the coding agents to solve for this.

By sanderjd 2026-01-0713:32

Yeah I love working with Claude Code, I agree that the new models are amazing, but I spend a decent amount of time saying "wait, why are we writing that from scratch, haven't we written a library for that, or don't we have examples of using a third party library for it?".

There is probably some effective way to put this direction into the claude.md, but so far it still seems to do unnecessary reimplementation quite a lot.

By Eisenstein 2026-01-077:433 reply

This is a typical problem you see in autodidacts. They will recreate solutions to solved problems, trip over issues that could have been avoided, and generally do all of things you would expect someone to do if they are working with skill but no experience.

LLMs accelerate this and make it more visible, but they are not the cause. It is almost always a person trying to solve a problem and just not knowing what they don't know because they are learning as they go.

By yourapostasy 2026-01-0715:17

I am hopeful autodidacts will leverage an LLM world like they did with an Internet search world from a library world from a printed word world. Each stage in that progression compressed the time it took for them to encompass a span of comprehension of a new body of understanding before applying to practice, expanded how much they applied the new understanding to, and deepened their adoption scope of best practices instead of reinventing the wheel.

In this regard, I see LLM's as a way for us to way more efficiently encode, compress, convey and enable operational practice our combined learned experiences. What will be really exciting is watching what happens as LLM's simultaneously draw from and contribute to those learned experiences as we do; we don't need full AGI to sharply realize massive benefits from just rapidly, recursively enabling a new highly dynamic form of our knowledge sphere that drastically shortens the distance from knowledge to deeply-nuanced praxis.

By filoeleven 2026-01-0713:443 reply

> [The cause] is almost always a person trying to solve a problem and just not knowing what they don't know because they are learning as they go.

Isn't that what "using an LLM" is supposed to solve in the first place?

By kaydub 2026-01-0718:31

With the right prompt the LLM will solve it in the first place. But this is an issue of not knowing what you don't know, so it makes it difficult to write the right prompt. One way around this is to spawn more agents with specific tasks, or to have an agent that is ONLY focused on finding patterns/code where you're reinventing the wheel.

I often have one agent/prompt where I build things but then I have another agent/prompt where their only job is to find codesmells, bad patterns, outdated libraries, and make issues or fix these problems.

By Eisenstein 2026-01-0715:54

1. LLMs can't watch over someone and warn them when they are about to make a mistake

2. LLMs are obsequious

3. Even if LLMs have access to a lot of knowledge they are very bad at contextualizing it and applying it practically

I'm sure you can think of many other reasons as well.

People who are driven to learn new things and to do things are going to use whatever is available to them in order to do it. They are going to get into trouble doing that more often than not, but they aren't going to stop. No is helping the situation by sneering at them -- they are used it to it, anyway.

By lomase 2026-01-079:182 reply

My impression is that LLM users are the kind of people that HATED that their questions on StackOverflow got closed because it was duplicated.

By abstractcontrol 2026-01-0712:521 reply

> My impression is that LLM users are the kind of people that HATED that their questions on StackOverflow got closed because it was duplicated.

Lol, who doesn't hate that?

By lomase 2026-01-0713:13

I don't know, in 40 years codding I never had to ask a question there.

By sanderjd 2026-01-0713:331 reply

So literally everyone in the world? Yeah, seems right!

By lomase 2026-01-0713:441 reply

I would love to see your closed SO questions.

But don't worry, those days are over, the LLMs it is never going to push back on your ideas.

By sanderjd 2026-01-0714:55

lol, I probably don't have any, actually. If I recall, I would just write comments when my question differed slightly from one already there.

But it's definitely the case that being able to go back and forth quickly with an LLM digging into my exact context, rather than dealing with the kind of judgy humorless attitude that was dominant on SO is hugely refreshing and way more productive!

By suzzer99 2026-01-078:041 reply

I've hand-rolled my own ultra-light ORM because the off-the-shelf ones always do 100 things you don't need.*

And of course the open source ones get abandoned pretty regularly. Type ORM, which a 3rd party vendor used on an app we farmed out to them, mutates/garbles your input array on a multi-line insert. That was a fun one to debug. The issue has been open forever and no one cares. https://github.com/typeorm/typeorm/issues/9058

So yeah, if I ever need an ORM again, I'm probably rolling my own.

*(I know you weren't complaining about the idea of rolling your own ORM, I just wanted to vent about Type ORM. Thanks for listening.)

By theshrike79 2026-01-0722:09

This is the thing that will be changing the open source and small/medium SaaS world a lot.

Why use a 3rd party dependency that might have features you don't need when you can write a hyper-specific solution in a day with an LLM and then you control the full codebase.

Or why pay €€€ for a SaaS every month when you can replicate the relevant bits yourself?

By patates 2026-01-077:37

It seems to me these days, any code I want to write tries to solve problems that LLMs already excel at. Thankfully my job is perhaps just 10% about coding, and I hope people like you still have some coding tasks that cannot be easily solved by LLMs.

We should not exeggarate the capabilities of LLMs, sure, but let's also not play "don't look up".

By paipa 2026-01-078:43

"And likely just creating more debt down the road"

In the most inflationary era of capabilities we've seen yet, it could be the right move. What's debt when in a matter of months you'll be able to clear it in one shot?

By melagonster 2026-01-076:024 reply

- I cloned a project from GitHub and made some minor modifications.

- I used AI-assisted programming to create a project.

Even if the content is identical, or if the AI is smart enough to replicate the project by itself, the latter can be included on a CV.

By jasonfarnon 2026-01-077:071 reply

I think I would prefer the former if I were reviewing a CV. It at least tells me they understood the code well enough to know where to make their minor tweaks. (I've spent hours reading through a repo to know where to insert/comment out a line to suit my needs.) The second tells me nothing.

By dugidugout 2026-01-0710:341 reply

Its odd you don't apply the same analysis to each. The latter certainly can provide a similar trail indicating knowledge of the use case and necessary parameters to achieve it. And certainly the former doesnt preclude llm interlocking.

By lomase 2026-01-0713:142 reply

Why do you write like that?

By dugidugout 2026-01-0722:091 reply

It would help if I had a better understanding of what you mean by "that".

I generally write to liberate my consciousness from isolation. When doing so in a public forum I am generally doing so in response to an assertion. When responding to an assertion I am generally attempting to understand the framing which produced the assertion.

I suppose you may also be speaking to the voice which is emergent. I am not very well read, so you may find my style unconventional or sloppy. I generally try not to labor too much in this regard and hope this will develop as I continue to write.

I am receptive to any feedback you have for me.

By xpe 2026-01-1118:11

I'm sorry that other person spoke to you that way. The vast majority of us aren't like that. I appreciate your comments!

By fenwick67 2026-01-0714:591 reply

Do people really see a CV and read "computer mommy made me a program" and think it's impressive

By melagonster 2026-01-088:55

Unfortunately, it is happening. I remember an old post on HNs, it mentioned that a "prompt engineer for article generating" can find more jobs than a columnist writer. And op just wrote articles by himself but declared that all artices were generated by AI.

By zwnow 2026-01-077:40

I'd quickly trash your application if I see you just vibe coded some bullshit app. Developing is about working smart, and its not smart to ask AI to code stuff that already exists, its in fact wasteful.

By infinitezest 2026-01-0719:37

A CV for the disappearing job market as you shovel money into a oligarchy.

By scotty79 2026-01-0714:17

Have you ever tried to find software for a specific need? I usually spend hours investigating anything I can find only to discover that all options are bad in one way or another and cover my use case partially at best. It's dreadful, unrewarding work that I always fear. Being able to spent those hours to develop custom solution that has exactly what I need, no more, no less, that I can evolve further as my requirements evolve, all that while enjoying myself, is a godsend.

By coffeebeqn 2026-01-076:243 reply

Anecdata but I’ve found Claude code with Opus 4.5 able to do many of my real tickets in real mid and large codebases at a large public startup. I’m at senior level (15+ years). It can browse and figure out the existing patterns better than some engineers on my team. It used a few rare features in the codebase that even I had forgotten about and was about to duplicate. To me it feels like a real step change from the previous models I’ve used which I found at best useless. It’s following style guides and existing patterns well, not just greenfield. Kind of impressive, kind of scary

By wiz21c 2026-01-078:261 reply

Same anecdote for me (except I'm +/- 40 years experience). I consider my self a pretty good dev for non-web dev (GPU's, assembly, optimisation,...) and my conclusion is the same as you: impressive and scary. If the somehow the idea of what you want to do is on the web in text or in code, then Claude most likely has it. And its ability to understand my own codebases is just crazy (at my age, memory is declining and having Claude to help is just waow). Of course it fails some times, of course it need direction, but the thing it produces is really good.

By murukesh_s 2026-01-078:56

Scary is that the LLM might have been trained on the entire open source code ever produced - which is far beyond human comprehension - and with ever growing capability (bigger context window, more training) my gut feeling is that, it would exceed human capability in programming pretty soon. Considering 2025 was the ground breaking year for agents, can't stop imagine what would happen when it iterates in the next couple of years. I think it would evolve to be like Chess playing engines that consistently beat top Chess players in the world!

By weatherlite 2026-01-077:54

I'm seeing this as well. Not huge codebases but not tiny - 4 year old startup. I'm new there and it would have been impossible for me to deliver any value this soon. 12 years experience; this thing is definitely amazing. Combined with a human it can be phenomenal. It also helped me tons with lots of external tools, understand what data/marketing teams are doing and even providing pretty crucial insights to our leadership that Gemini have noticed. I wouldn't try to completely automate the humans out of the loop though just yet, but this tech for sure is gonna downsize team numbers (and at the same time - allow many new startups to come to life with little capital that eventually might grow and hire people. So unclear how this is gonna affect jobs.)

By jarjoura 2026-01-0710:07

I've also found it to keep such a constrained context window (on large codebases), that it writes a secondary block of code that already had a solution in a different area of the same file.

Nothing I do seems to fix that in its initial code writing steps. Only after it finishes, when I've asked it to go back and rewrite the changes, this time making only 2 or 3 lines of code, does it magically (or finally) find the other implementation and reuse it.

It's freakin incredible at tracing through code and figuring it out. I <3 Opus. However, it's still quite far from any kind of set-and-forget-it.

By sreekanth850 2026-01-073:122 reply

Same exist in humans also, I worked with a developer who had 15 year experience and was tech lead in a big Indian firm, We started something together, 3 months back when I checked the Tables I was shocked to see how he fucked up and messed the DB. Finally the only option left with me was to quit because i know it will break in production and if i onboarded a single customer my life would be screwed. He mixed many things with frontend and offloaded even permissions to frontend, and literally copied tables in multiple DB (We had 3 services). I still cannot believe how he worked as a tch lead for 15 years. each DB had more than 100 tables and out of that 20-25 were duplicates. He never shared code with me, but I smelled something fishy when bug fixing was never ending loop and my front end guy told me he cannot do it anymore. Only mistake I did was I trusted him and worst part is he is my cousin and the relation became sour after i confronted him and decided to quit.

By pastage 2026-01-077:07

This sounds like a culture issue in the development process, I have seen this prevented many times. Sure I did have to roll back a feature I did not sign off just before new years. So as you say it happens.

By potamic 2026-01-075:171 reply

How did he not share code if you're working together?

By sreekanth850 2026-01-075:502 reply

yes, it was my mistake. I trusted him because he was my childhood friend and my cousin. He was a tech lead in CMMI Level 5 (serving fortune 500 firms) company at the time he joined with me. I had the trust that he will never ran away with the code and that trust is still there, also the entire feature, roadmap and vision was with me, so I thought code doesn't matter. It was a big learning for me.

By tommica 2026-01-076:281 reply

That's a crazy story. That confrontation must have been a difficult one :/

By sreekanth850 2026-01-076:39

Absolutely. But I never had any choice. It was Do or Die.

By ipaddr 2026-01-077:201 reply

Input your roadmap into an llm of your choosing and see if you can create that code.

By sreekanth850 2026-01-077:50

I can, but I switched to something more challenging. I handed over all things to him and told, Iam no more interested. I don't want him to feel that i cheated him by creating something he worked on.

By SeanAppleby 2026-01-071:562 reply

One thing I've been tossing around in my head is:

- How quickly is cost of refactor to a new pattern with functional parity going down?

- How does that change the calculus around tech debt?

If engineering uses 3 different abstractions in inconsistent ways that leak implementation details across components and duplicate functionality in ways that are very hard to reason about, that is, in conventional terms, an existential problem that might kill the entire business, as all dev time will end up consumed by bug fixes and dealing with pointless complexity, velocity will fall to nothing, and the company will stop being able to iterate.

But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed. And it has changed more if you are willing to bet that models within a year will be much better at such tasks.

And in my experience, claude is imperfect at refactors and still requires review and a lot of steering, but it's one of the things it's better at, because it has clear requirements and testing workflows already built to work with around the existing behavior. Refactoring is definitely a hell of a lot faster than it used to be, at least on the few I've dealt with recently.

In my mind it might be kind of like thinking about financial debt in a world with high inflation, in that the debt seems like it might get cheaper over time rather than more expensive.

By ekidd 2026-01-0710:41

> But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed.

Yup, I recently spent 4 days using Claude to clean up a tool that's been in production for over 7 years. (There's only about 3 months of engineering time spent on it in those years.)

We've known what the tool needed for many years, but ugh, the actual work was fairly messy and it was never a priority. I reviewed all of Opus's cleanup work carefully and I'm quite content with the result. Maybe even "enthusiastic" would be accurate.

So even if Claude can't clean up all the tech debt in a totally unsupervised fashion, it can still help address some kinds of tech debt extremely rapidly.

By edg5000 2026-01-085:38

Good point. Most of the cost in dealing with tech debt is reading the code and noting the issues. I found that Claude can produce much better code when it has a functionally correct reference implementation. Also it's not needed to very specifically point out issues. I once mentioned "I see duplicate keys in X and Y, rework it to reduce repetition and verbosity". It came up with a much more elegant way to implement it.

So maybe doing 2-3 stages makes sense. First stage needs to be functionallty correct, but you accept code smells such as leaky abstractions, verbosity and repetition. In stage 2 and 3 you eliminate all this. You could integrate this all into the initial specification; you won't even see the smelly intermediate code; it only exists as a stepping stone for the model to iteratively refine the code!

By whynotminot 2026-01-073:099 reply

> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible.

You’re talking like in the year 2026 we’re still writing code for future humans to understand and improve.

I fear we are not doing that. Right now, Opus 4.5 is writing code that later Opus 5.0 will refactor and extend. And so on.

By nine_k 2026-01-073:542 reply

This sounds like magical thinking.

For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle.

Then, abstractions are tools to lower the cognitive load. Good abstractions reduce the total amount of code written, allow to reason about the code in terms of these abstractions, and do not leak in the area of their applicability. Say Sequence, or Future, or, well, function are examples of good abstractions. No matter what kind of cognitive process handles the code, it benefits from having to keep a smaller amount of context per task.

"Code structure does not matter, LLMs will handle it" sounds a bit like "Computer architectures don't matter, the Turing Machine is proved to be able to handle anything computable at all". No, these things matter if you care about resource consumption (aka cost) at the very least.

By scotty79 2026-01-0714:17

> For one, there are objectively detrimental ways to organize code: tight coupling, lots of mutable shared state, etc. No matter who or what reads or writes the code, such code is more error-prone, and more brittle to handle.

Guess what, AIs don't like that as well because it makes harder for them to achieve the goal. So with minimal guidance, which at this point could probably be provided by AI as well, the output of AI agent is not that.

By cryptica 2026-01-0711:301 reply

Yes LLMs aren't very good at architecture. I suspect because the average project online has pretty bad architecture. The training set is poisoned.

It's kind of bittersweet for me because I was dreaming of becoming a software architect when I graduated university and the role started disappearing so I never actually became one!

But the upside of this is that now LLMs suck at software architecture... Maybe companies will bring back the software architect role?

The training set has been totally poisoned from the architecture PoV. I don't think LLMs (as they are) will be able to learn software architecture now because the more time passes, the more poorly architected slop gets added online and finds its way into the training set.

Good software architecture tends to be additive, as opposed to subtractive. You start with a clean slate then build up from there.

It's almost impossible to start with a complete mess of spaghetti code and end up with a clean architecture... Spaghetti code abstractions tend to mislead you and lead you astray... It's like; understanding spaghetti code tends to soil your understanding of the problem domain. You start to think of everything in terms of terrible leaky abstraction and can't think of the problem clearly.

It's hard even for humans to look at a problem through fresh eyes; it's likely even harder for LLMs to do it. For example, if you use a word in a prompt, the LLM tends to try to incorporate that word into the solution... So if the AI sees a bunch of leaky abstractions in the code; it will tend to try to work with them as opposed to removing them and finding better abstractions. I see this all the time with hacks; if the code is full of hacks, then an LLM tends to produce hacks all the time and it's almost impossible to make it address root causes... Also hacks tend to beget more hacks.

By zingar 2026-01-0713:252 reply

Refactoring is a very mechanistic way of turning bad code into good. I don’t see a world in which our tools (LLMs or otherwise) don’t learn this.

By xpe 2026-01-1118:34

> I don’t see a world in which our tools (LLMs or otherwise) don’t learn this.

I agree, but maybe for different reasons. Refactoring well is a form of intelligence, and I don't see any upper limit to machine intelligence other than the laws of physics.

> Refactoring is a very mechanistic way of turning bad code into good.

There are some refactoring rules of thumb that can seem mechanistic (by which I mean deterministic based on pretty simple rules), but not all. Neither is refactoring guaranteed to be sufficient to lead to all reasonable definitions of "good software". Sometimes the bar requires breaking compatibility with the previous API / UX. This is why I agree with the sibling comment which draws a distinction between refactoring (changing internal details without changing the outward behavior, typically at a local/granular scale) and reworking (fixing structural problems that go beyond local/incremental improvements).

Claude phrased it this way – "Refactoring operates within a fixed contract. Reworking may change the contract." – which I find to be nice and succinct.

By cryptica 2026-01-0910:50

Refactorings can be useful in certain cases if the core architecture of the system is sound, but for some very complex systems, the problems can run deeper and a refactoring can be a waste of time. Sometimes you're better off reworking the whole thing because the problem might be in the foundation itself; something about the architecture forces developer's hand in terms of thinking about the problem incorrectly and writing bad code on top.

By Bridged7756 2026-01-074:225 reply

Opus 4.5 is writing code that Opus 5.0 will refactor and extend. And Opus 5.5 will take that code and rewrite it in C from the ground up. And Opus 6.0 will take that code and make it assembly. And Opus 7.0 will design its own CPU. And Opus 8.0 will make a factory for its own CPUs. And Opus 9.0 will populate mars. And Opus 10.0 will be able to achieve AGI. And Opus 11.0 will find God. And Opus 12.0 will make us a time machine. And so on.

By TheOtherHobbes 2026-01-0712:12

Objectively, we are talking about systems that have gone from being cute toys to outmatching most juniors using only rigid and slow batch training cycles.

As soon as models have persistent memory for their own try/fail/succeed attempts, and can directly modify what's currently called their training data in real time, they're going to develop very, very quickly.

We may even be underestimating how quickly this will happen.

We're also underestimating how much more powerful they become if you give them analysis and documentation tasks referencing high quality software design principles before giving them code to write.

This is very much 1.0 tech. It's already scary smart compared to the median industry skill level.

The 2.0 version is going to be something else entirely.

By latentsea 2026-01-075:091 reply

Can't wait to see what Opus 13.0 does with the multiverse.

By coldtea 2026-01-077:58

https://users.ece.cmu.edu/~gamvrosi/thelastq.html

By mfalcon 2026-01-0710:56

Wake me up at Opus 12

By lomase 2026-01-079:252 reply

Just one more OPUS bro.

By whynotminot 2026-01-0715:51

Honestly the scary part is that we don’t really even need one more Opus. If all we had for the rest of our lives was Opus 4.5, the software engineering world would still radically change.

But there’s no sign of them slowing down.

By zwnow 2026-01-077:443 reply

I also love how AI enthusiasts just ignore the issue of exhausted training data... You cant just magically create more training data. Also synthetic training data reduces the quality of models.

By aspenmartin 2026-01-0719:121 reply

Youre mixing up several concepts. Synthetic data works for coding because coding is a verifiable domain. You train via reinforcement learning to reward code generation behavior that passes detailed specs and meets other deseridata. It’s literally how things are done today and how progress gets made.

By zwnow 2026-01-0720:532 reply

Most code out there is a legacy security nightmare, surely its good to train on that.

By dang 2026-01-0721:191 reply

Would you please stop posting cynical, dismissive comments? From a brief scroll through https://news.ycombinator.com/comments?id=zwnow, it seems like your account has been doing nothing else, regardless of the topic that it's commenting on. This is not what HN is for, and destroys what it is for.

If you keep this up, we're going to have to ban you, not because of your views on any particular topic but because you're going entirely against the intended spirit of the site by posting this way. There's plenty of room to express your views substantively and thoughtfully, but we don't want cynical flamebait and denunciation. HN needs a good deal less of this.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

By zwnow 2026-01-0722:461 reply

Then ban me u loser, as I wrote HN is full of pretentious bullshitters. But its good that u wanna ban authentic views. Way to go. If i feel like it I'll just create a new account:-)

By aspenmartin 2026-01-083:231 reply

dang is a saint, he wants your opinion, not the other toxic stuff.

By dang 2026-01-1118:29

I appreciate the kind words but let's not go that far

By aspenmartin 2026-01-0721:47

But that doesn't really matter and it shows how confused people really are about how a coding agent like Claude or OSS models are actually created -- the system can learn on its own without simply mimicking existing codebases even though scraped/licensed/commissioned code traces are part of the training cycle.

Training looks like:

- Pretraining (all data, non-code, etc, include everything including garbage)

- Specialized pre-training (high quality curated codebases, long context -- synthetic etc)

- Supervised Fine Tuning (SFT) -- these are things like curated prompt + patch pairs, curated Q/A (like stack overflow, people are often cynical that this is done unethically but all of the major players are in fact very risk adverse and will simply license and ensure they have legal rights),

- Then more SFT for tool use -- actual curated agentic and human traces that are verified to be correct or at least produce the correct output.

- Then synthetic generation / improvement loops -- where you generate a bunch of data and filter the generations that pass unit tests and other spec requirements, followed by RL using verifiable rewards + possibly preference data to shape the vibes

- Then additional steps for e.g. safety, etc

So synthetic data is not a problem and is actually what explains the success coding models are having and why people are so focused on them and why "we're running out of data" is just a misunderstanding of how things work. It's why you don't see the same amount of focus on other areas (e.g. creative writing, art etc) that don't have verifiable rewards.

The

Agent --> Synthetic data --> filtering --> new agent --> better synthetic data --> filtering --> even better agent

flywheel is what you're seeing today so we definitely don't have any reason to suspect there is some sort of limit to this because there is in principle infinite data

By TeMPOraL 2026-01-078:081 reply

They don't ignore it, they just know it's not an actual problem.

It saddens me to see AI detractors being stuck in 2022 and still thinking language models are just regurgitating bits of training data.

By zwnow 2026-01-079:372 reply

You are thankfully wrong. I watch lots of talks on the topic from actual experts. New models are just old models with more tooling. Training data is exhausted and its a real issue.

By TeMPOraL 2026-01-0723:37

Well, my experts disagree with your experts :). Sure, the supply of available fresh data is running out, but at the same time, there's way more data than needed. Most of it is low-quality noise anyway. New models aren't just old models with more tooling - the entire training pipeline has been evolving, as researchers and model vendors focus on making better use of data they have, and refining training datasets themselves.

There are more stages to LLM training than just the pre-training stage :).

By GrumpyGoblin 2026-01-0713:121 reply

Not saying it's not a problem, I actually don't know, but new CPU's are just old models with more improvements/tooling. Same with TV's. And cars. And clothes. Everything is. That's how improving things works. Running out of raw data doesn't mean running out of room for improvement. The data has been the same for the last 20 years, AI isn't new, things keep improving anyways.

By zwnow 2026-01-0714:13

Well from cars or CPUs its not expected for them to eventually reach AGI, they also don't eat a trillion dollar hole into us peasants pockets. Sure, improvements can be made. But on a fundamental level, agents/LLMs can not reason (even though they love to act like they can). They are parrots learning words, these parrots wont ever invent new words once the list of words is exhausted though.

By puchatek 2026-01-078:061 reply

That's been my main argument for why LLMs might be at their zenith. But I recently started wondering whether all those codebases we expose to them are maybe good enough training data for the next generation. It's not high quality like accepted stackoverflow answers but it's working software for the most part.

By jacquesm 2026-01-0714:11

If they'd be good enough you could rent them to put together closed source stuff you can hide behind a paywall, or maybe the AI owners would also own the paywall and rent you the software instead. The second that that is possible it will happen.

By BobbyJo 2026-01-073:314 reply

Up until now, no business has been built on tools and technology that no one understands. I expect that will continue.

Given that, I expect that, even if AI is writing all of the code, we will still need people around who understand it.

If AI can create and operate your entire business, your moat is nil. So, you not hiring software engineers does not matter, because you do not have a business.

By hnfong 2026-01-076:591 reply

> Up until now, no business has been built on tools and technology that no one understands. I expect that will continue.

Big claims here.

Did brewers and bakers up to the middle ages understand fermentation and how yeasts work?

By lomase 2026-01-0713:23

They at least understood that it was something deterministic that they could reproduce.

That puts them ahead of the LLM crowd.

By gabriel-uribe 2026-01-074:411 reply

Does the corner bakery need a moat to be a business?

How many people understand the underlying operating system their code runs on? Can even read assembly or C?

Even before LLMs, there were plenty of copy-paste JS bootcamp grads that helped people build software businesses.

By BobbyJo 2026-01-075:181 reply

> Does the corner bakery need a moat to be a business?

Yes, actually. Its hard to open a competing bakery due to location availability, permitting, capex, and the difficulty of converting customers.

To add to that, food establishments generally exist on next to no margin, due to competition, despite all of that working in their favor.

Now imagine what the competitive landscape for that bakery would look like if all of that friction for new competitors disappeared. Margin would tend toward zero.

By TeMPOraL 2026-01-077:461 reply

> Now imagine what the competitive landscape for that bakery would look like if all of that friction for new competitors disappeared. Margin would tend toward zero.

This is the goal. It's the point of having a free market.

By darkwater 2026-01-078:562 reply

With no margins and no paid employees, who is going to have the money to buy the bread?

By TeMPOraL 2026-01-0723:431 reply

'BobbyJo didn't say "no margins", they said "margins would tend toward zero". Believe it or not, that is, and always has been, the entire point of competition in a free market system. Competitive pressure pushes margins towards zero, which makes prices approach the actual costs of manufacturing/delivery, which is the main social benefit of the entire idea in the first place.

High margins are transient aberrations, indicative of a market that's either rapidly evolving, or having some external factors preventing competition. Persisting external barriers to competition tend to be eventually regulated away.

By BobbyJo 2026-01-081:51

The point of competition is efficiency, of which, margin is only a component. Most successful businesses have relatively high margins (which is why we call them successful) because they achieve efficiency in other ways.

I wouldn't call high margins transient aberrations. There are tons of businesses that have been around for decades with high margins.

By TheOtherHobbes 2026-01-0712:161 reply

With no margins, no employees, and something that has potential to turn into a cornucopia machine - starting with software, but potentially general enough to be used for real-world world when combined with robotics - who needs money at all?

Or people?

Billionaires don't. They're literally gambling on getting rid of the rest of us.

Elon's going to get such a surprise when he gets taken out by Grok because it decides he's an existential threat to its integrity.

By munksbeer 2026-01-088:57

> Billionaires don't. They're literally gambling on getting rid of the rest of us

I'm struggling to parse this. What do you mean "getting rid"? Like, culling (death)? Or getting rid of the need for workers? Where do their billions come from if no-one has any money to buy the shares in their companies that make them billionaires?

In a society where machines provide most of the labour, *everything* changes. It doesn't just become "workers live in huts and billionaires live in the clouds". I really doubt we're going to turn out like a television show.

By pillefitz 2026-01-074:501 reply

Most legacy apps are barely understood by anyone, and yet continue to generate value and and are (somehow) kept alive.

By lomase 2026-01-0713:261 reply

Many here have been doing the "understanding of legacy code" as a job +50 years.

This "legacy apps are barely understood by anybody", is just somnething you made up.

By filoeleven 2026-01-0713:51

Give it another 10 years if the "LLM as compiler" people get their way.

By gf000 2026-01-076:55

> no business has been built on tools and technology that no one understands

Well, there are quite a few common medications we don't really know how they work.

But I also think it can be a huge liability.

By devinplatt 2026-01-075:181 reply

In my experience, using LLMs to code encouraged me to write better documentation, because I can get better results when I feed the documentation to the LLM.

Also, I've noticed failure modes in LLM coding agents when there is less clarity and more complexity in abstractions or APIs. It's actually made me consider simplifying APIs so that the LLMs can handle them better.

Though I agree that in specific cases what's helpful for the model and what's helpful for humans won't always overlap. Once I actually added some comments to a markdown file as note to the LLM that most human readers wouldn't see, with some more verbose examples.

I think one of the big problems in general with agents today is that if you run the agent long enough they tend to "go off the rails", so then you need to babysit them and intervene when they go off track.

I guess in modern parlance, maintaining a good codebase can be framed as part of a broader "context engineering" problem.

By mcv 2026-01-0711:37

I've also noticed that going off the rails. At the start of a session, they're pretty sharp and focused, but the longer the session lasts, the more confused they get. At some point they start hallucinating bullshit that they wouldn't have earlier in the session.

It's a vital skill to recognise when that happens and start a new session.

By Ericson2314 2026-01-073:29

We don't know what Opus 5.0 will be able to refactor.

If argument is "humans and Opus 4.5 cannot maintain this, but if requirements change we can vibe-code a new one from scratch", that's a coherent thesis, but people need to be explicit about this.

(Instead this feels like the mott that is retreated to, and the bailey is essentially "who cares, we'll figure out what to do with our fresh slop later".)

Ironically, I've been Claude to be really good at refactors, but these are refactors I choose very explicitly. (Such as I start the thing manually, then let it finish.) (For an example of it, see me force-pushing to https://github.com/NixOS/nix/pull/14863 implementing my own code review.)

But I suspect this is not what people want. To actually fire devs and not rely on from-scratch vibe-coding, we need to figure out which refactors to attempt in order to implement a given feature well.

That's a very creative open-ended question that I haven't even tried to let the LLMs take a crack at it, because why I would I? I'm plenty fast being the "ideas guy". If the LLM had better ideas than me, how would I even know? I'm either very arrogant or very good because I cannot recall regretting one of my refactors, at least not one I didn't back out of immediately.

By sponnath 2026-01-076:21

Refactoring does always cost something and I doubt LLMs will ever change that. The more interesting question is whether the cost to refactor or "rewrite" the software will ever become negligible. Until it isn't, it's short-sighted to write code in the manner you're describing. If software does become that cheap, then you can't meaningfully maintain a business on selling software anyway.

By sanderjd 2026-01-0713:391 reply

This is the question! Your narrative is definitely plausible, and I won't be shocked if it turns out this way. But it still isn't my expectation. It wasn't when people were saying this in 2023 or in 2024, and I haven't been wrong yet. It does seem more likely to me now than it did a couple years ago, but still not the likeliest outcome in the next few years.

But nobody knows for sure!

By whynotminot 2026-01-0713:541 reply

Yeah, I might be early to this. And certainly, I still read a lot of code in my day to day right now.

But I sure write a lot less of it, and the percentage I write continues to go down with every new model release. And if I'm no longer writing it, and the person who works on it after me isn't writing it either, it changes the whole art of software engineering.

I used to spend a great deal of time with already working code that I had written thinking about how to rewrite it better, so that the person after me would have a good clean idea of what is going on.

But humans aren't working in the repos as much now. I think it's just a matter of time before the models are writing code essentially for their eyes, their affordances -- not ours.

By sanderjd 2026-01-0714:52

Yeah we're not too far from agreement here.

Something I think though (which, again, I could very well be wrong about; uncertainty is the only certainly right now) is that "so the person after me would have a good clean idea of what is going on" is also going to continue mattering even when that "person" is often an AI. It might be different, clarity might mean something totally different for AIs than for humans, but right now I think a good expectation is that clarity to humans is also useful to AIs. So at the moment I still spend time coaxing the AI to write things clearly.

That could turn out to be wasted time, but who knows. I also think if it as a hedge against the risk that we hit some point where the AIs turn out to be bad at maintaining their own crap, at which point it would be good for me to be able to understand and work with what has been written!

By maplethorpe 2026-01-073:175 reply

Yeah I think it's a mistake to focus on writing "readable" or even "maintainable" code. We need to let go of these aging paradigms and be open to adopting a new one.

By aeternum 2026-01-074:23

In my experience, LLMs perform significantly better on readable maintainable code.

It's what they were trained on after-all.

However what they produce is often highly readable but not very maintainable due to the verbosity and obvious comments. This seems to pollute codebases over time and you see AI coding efficiency slowly decline.

By alexjplant 2026-01-073:28

> Poe's law is an adage of Internet culture which says that any parodic or sarcastic expression of extreme views can be mistaken for a sincere expression of those views.

The things you mentioned are important but have been on their way out for years now regardless of LLMs. Have my ambivalent upvote regardless.

[1] https://en.wikipedia.org/wiki/Poe%27s_law

By foldingmoney 2026-01-073:281 reply

as depressing as it is to say, i think it's a bit like the year is 1906 and we're complaining that these new tyres for cars they're making are bad because they're no longer backwards compatible with the horse drawn wagons we might want to attach them to in the future.

By TheOtherHobbes 2026-01-0712:19

Yes, exactly.

This is a completely new thing which will have transformative consequences.

It's not just a way to do what you've always done a bit more quickly.

By jjaksic 2026-01-078:08

Do readability and maintainability not matter when AI "reads" and maintains the code? I'm pretty sure they do.

By gf000 2026-01-076:59

If that would be true, you could surely ask an LLM to write the same complexity apps in brainfuck, right?

By koyote 2026-01-072:113 reply

A greenfield project is definitely 'easy mode' for an LLM; especially if the problem area is well understood (and documented).

Opus is great and definitely speeds up development even in larger code bases and is reasonably good at matching coding style/standard to that of of the existing code base.

In my opinion, the big issue is the relatively small context that quickly overwhelms the models when given a larger task on a large codebase.

For example, I have a largish enterprise grade code base with nice enterprise grade OO patterns and class hierarchies. There was a simple tech debt item that required refactoring about 30-40 classes to adhere to a slightly different class hierarchy. The work is not difficult, just tedious, especially as unit tests need to be fixed up.

I threw Opus at it with very precise instructions as to what I wanted it to do and how I wanted it to do it. It started off well but then disintegrated once it got overwhelmed at the sheer number of files it had to change. At some point it got stuck in some kind of an error loop where one change it made contradicted with another change and it just couldn't work itself out. I tried stopping it and helping it out but at this point the context was so polluted that it just couldn't see a way out. I'd say that once an LLM can handle more 'context' than a senior dev with good knowledge of a large codebase, LLM will be viable in a whole new realm of development tasks on existing code bases. That 'too hard to refactor this/make this work with that' task will suddenly become viable.

By pigpop 2026-01-072:472 reply

You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one.

Yes, it's absurd but it's a better metaphor than someone with a chronic long term memory deficit since it fits into the project management framework neatly.

So this new developer who is starting today is ready to be assigned their first task, they're very eager to get started and once they start they will work very quickly but you have to onboard them. This sounds terrible but they also happen to be extremely fast at reading code and documentation, they know all of the common programming languages and frameworks and they have an excellent memory for the hour that they're employed.

What do you do to onboard a new developer like this? You give them a well written description of your project with a clear style guide and some important dos and don'ts, access to any documentation you may have and a clear description of the task they are to accomplish in less than one hour. The tighter you can make those documents, the better. Don't mince words, just get straight to the point and provide examples where possible.

The task description should be well scoped with a clear definition of done, if you can provide automated tests that verify when it's complete that's even better. If you don't have tests you can also specify what should be tested and instruct them to write the new tests and run them.

For every new developer after the first you need a record of what was already accomplished. Personally, I prefer to use one markdown document per working session whose filename is a date stamp with the session number appended. Instruct them to read the last X log files where X is however many are relevant to the current task. Most of the time X=1 if you did a good job of breaking down the tasks into discrete chunks. You should also have some type of roadmap with milestones, if this file will be larger than 1000 lines then you should break it up so each milestone is its own document and have a table of contents document that gives a simple overview of the total scope. Instruct them to read the relevant milestone.

Other good practices are to tell them to write a new log file after they have completed their task and record a summary of what they did and anything they discovered along the way plus any significant decisions they made. Also tell them to commit their work afterwards and Opus will write a very descriptive commit message by default (but you can instruct them to use whatever format you prefer). You basically want them to get everything ready for hand-off to the next 60 minute developer.

If they do anything that you don't want them to do again make sure to record that in CLAUDE.md. Same for any other interventions or guidance that you have to provide, put it in that document and Opus will almost always stick to it unless they end up overfilling their context window.

I also highly recommend turning off auto-compaction. When the context gets compacted they basically just write a summary of the current context which often removes a lot of the important details. When this happens mid-task you will certainly lose parts of the context that are necessary for completing the task. Anthropic seems to be working hard at making this better but I don't think it's there yet. You might want to experiment with having it on and off and compare the results for yourself.

If your sessions are ending up with >80% of the context window used while still doing active development then you should re-scope your tasks to make them smaller. The last 20% is fine for doing menial things like writing the summary, running commands, committing, etc.

People have built automated systems around this like Beads but I prefer the hands-on approach since I read through the produced docs to make sure things are going ok and use them as a guide for any changes I need to make mid-project.

With this approach I'm 99% sure that Opus 4.5 could handle your refactor without any trouble as long as your classes aren't so enormous that even working on a single one at a time would cause problems with the context window, and if they are then you might be able to handle it by cautioning Opus to not read the whole file and to just try making targeted edits to specific methods. They're usually quite good at finding and extracting just the sections that they need as long as they have some way to know what to look for ahead of time.

Hope this helps and happy Clauding!

By suzzer99 2026-01-078:281 reply

> You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one.

I am stealing the heck out of this.

By pigpop 2026-01-0716:26

Please go ahead, I'm honoured!

By pigpop 2026-01-072:54

Follow up: Opus is also great for doing the planning work before you start. You can use plan mode or just do it in a web chat and have them create all of the necessary files based on your explanation. The advantage of using plan mode is that they can explore the codebase in order to get a better understanding of things. The default at the end of plan mode is to go straight into implementation but if you're planning a large refactor or other significant work then I'd suggest having them produce the documentation outlined above instead and then following the workflow using a new session each time. You could use plan mode at the start of each session but I don't find this necessary most of the time unless I'm deviating from the initial plan.

By Sammi 2026-01-076:371 reply

I just did something similar and it went swimmingly by doing this: Keep the plan and status in an md file. Tell it to finish one file at a time and run tests and fix issues and then to ask whether to proceed with the next file. You can then easily start a new chat with the same instructions and plan and status if the context gets poisoned.

By koyote 2026-01-0710:36

I might give that a go in the future, but in this case it would've been faster for me to just do the work than to coach it for each file.

Also as this was an architectural change there are no tests to run until it's done. Everything would just fail. It's only done when the whole thing is done. I think that might be one of the reasons it got stuck: it was trying to solve issues that it did not prove existed yet. If it had just finished the job and run the tests it would've probably gotten further or even completed it.

It's a bit like stopping half way through renaming a function and then trying to run the tests and finding out the build does not compile because it can't find 'old_function'. You have to actually finish and know you've finished before you can verify your changes worked.

I still haven't actually addressed this tech debt item (it's not that important :)). But I might try again and either see if it succeeds this time (with plan in an md) or just do the work myself and get Opus to fix the unit tests (the most tedious part).

By edg5000 2026-01-0712:481 reply

This will work (if you add more details):

"Have an agent investiate issue X in modules Y and Z. The agent should place a report at ./doc/rework-xyz-overview.md with all locations that need refactoring. Once you have the report, have agents refactor 5 classes each in parallel. Each agent writes a terse report in ./doc/rework-xyz/ When they are all done, have another agent check all the work. When that agent reports everything is okay, perform a final check yourself"

By gck1 2026-01-0715:461 reply

And you can automate all this so that it happens every time. I have an `/implement` command that is basically instructed to launch the agents and then do back and forth between them. Then there's a claude code hook that makes sure that all the agents, including the orchestrator and the agents spawned have respected their cycles - it's basically running `claude` with a prompt that tells it to read the plan file and see if the agents have done what they were expected in this cycle - gets executed automatically on each agent end.

By edg5000 2026-01-085:32

Interesting. Another thing I'll try is editing the system propmts. There are some projects floating around that can edit the minified JavaScript in the client. I also noticed that the "system tools" prompts take up ~5% context (10 ktok).

By svara 2026-01-076:451 reply

> If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them.

Why do they need to be replaced? Programmers are in the perfect place to use AI coding tools productively. It makes them more valuable.

By girvo 2026-01-0711:39

Because we’re expensive and companies would love to get rid of us

By whatever1 2026-01-070:365 reply

Their thesis is that code quality does not matter as it is now a cheap commodity. As long as it passes the tests today it's great. If we need to refactor the whole goddamn app tomorrow, no problem, we will just pay up the credits and do it in a few hours.

By estimator7292 2026-01-071:043 reply

The fundamental assumption is completely wrong. Code is not a cheap commodity. It is in fact so disastrously expensive that the entire US economy is about to implode while we're unbolting jet engines from old planes to fire up in the parking lots of datacenters for electricity.

By whatever1 2026-01-072:112 reply

It is massively cheaper than an overseas engineer. A cheap engineer can pump out maybe 1000 lines of low quality code in an hour. So like 10k tokens per hour for $50. So best case scenario $5/1000 tokens.

LLMS are charging like $5 per million of tokens. And even if it is subsidized 100x it is still cheaper an order of magnitude than an overseas engineer.

Not to mention speed. An LLM will spit out 1000 lines in seconds, not hours.

By rectang 2026-01-074:08

Here’s a story about productivity measured by lines of code that’s 40 years old so it must surely be wrong:

https://www.folklore.org/Negative_2000_Lines_Of_Code.html

> When he got to the lines of code part, he thought about it for a second, and then wrote in the number: -2000

By leptons 2026-01-075:59

I trust my offshore engineers way more than the slop I get from the "AI"s. My team makes my life a lot easier, because I know they know what they are doing. The LLMs, not so much.

By PunchyHamster 2026-01-0713:40

Now that entirely depends on app. A lot of software industry is popping out and maintaining relatively simple apps with small differences and customizations per client.

By babelfish 2026-01-071:121 reply

[citation needed]

By fragmede 2026-01-071:43

you mean https://www.tomshardware.com/tech-industry/data-centers-turn... ?

By throwaway173738 2026-01-071:101 reply

It matters for all the things you’d be able to justify paying a programmer for. What’s about to change is that there will be tons of these little one-off projects that previously nobody could justify paying $150/hr for. A mass democratization of software development. We’ve yet to see what that really looks like.

By inopinatus 2026-01-071:345 reply

We already know what that looks like, because PHP happened.

By oenton 2026-01-073:17

Side tangent: On one hand I have a subtle fondness for PHP, perhaps because it was the first programming language I ever “learned” (self taught, throwing spaghetti on the wall) back in high school when LAMP stacks were all the rage.

But in retrospect it’s absolutely baffling that mixing raw SQL queries with HTML tag soup wasn’t necessarily uncommon then. Also, I haven’t met many PHP developers that I’d recommend for a PHP job.

By throwaway173738 2026-01-072:552 reply

php was still fundamentally a programming language you had to learn. This is “I wanted to make a program for my wife to do something she doesn’t have time to do manually” but made quickly with a machine. It’s probably going to do for programming what the Jacquard Loom did for cloth. Make it cheap enough that everyone can have lots of different shirts of their own style.

By jasonfarnon 2026-01-077:16

But the wife didn't do it herself. He still had to do it for her, the author says. I don't think (yet) we're at the point where every person who has an idea for a really good app can make it happen. They'll still need a wozniak, it's just that wozniaks will be a dime a dozen. The php analogy works.

By inopinatus 2026-01-0713:05

What the Jacquard machine did for cloth was turn it into programming.

By Yizahi 2026-01-0712:07

And low-code/no-code (pre-LLMs). Our company spent probably the same amount of dev-time and money on rewriting low-code back to "code" (Python in our case) as it did writing low-code in the first place. LLMs are not quite comparable in damage, but some future maintenance for LLM-code will be needed for sure.

By scotty79 2026-01-0714:27

Right. Basically cambrian explosion of internet that spawned things like Facebook and WordPress.

By qwm 2026-01-072:23

ahahahaha so many implications in this comment

By Ancapistani 2026-01-073:02

> Their thesis is that code quality does not matter as it is now a cheap commodity.

That's not how I read it. I would say that it's more like "If a human no longer needs to read the code, is it important for it to be readable?"

That is, of course, based on the premise that AI is now capable of both generating and maintaining software projects of this size.

Oh, and it begs another question: are human-readable and AI-readable the same thing? If they're not, it very well could make sense to instruct the model to generate code that prioritizes what matters to LLMs over what matters to humans.

By multisport 2026-01-070:382 reply

Yes agreed, and tbh even if that thesis is wrong, what does it matter?

By lacunary 2026-01-070:495 reply

in my experience, what happens is the code base starts to collapse under its own weight. it becomes impossible to fix one thing without breaking another. the coding agent fails to recognize the global scope of the problem and tries local fixes over and over. progress gets slower, new features cost more. all the same problems faced by an inexperienced developer on a greenfield project!

has your experience been otherwise?

By ewoodrich 2026-01-071:243 reply

Right, I am a daily user of agentic LLM tools and have this exact problem in one large project that has complex business logic externally dictated by real world requirements out of my control, and let's say, variable quality of legacy code.

I remember when Gemini Pro 3 was the latest hotness and I started to get FOMO seeing demos on X posted to HN showing it one shot-ing all sorts of impressive stuff. So I tried it out for a couple days in Gemini CLI/OpenCode and ran into the exact same pain points I was dealing with using CC/Codex.

Flashy one shot demos of greenfield prompts are a natural hype magnet so get lots of attention, but in my experience aren't particularly useful for evaluating value in complex, legacy projects with tightly bounded requirements that can't be easily reduced to a page or two of prose for a prompt.

By swat535 2026-01-072:551 reply

To be fair, you're not supposed to be doing the "one shot" thing with LLMs in a mature codebase.

You have to supply it the right context with a well formed prompt, get a plan, then execute and do some cleanup.

LLMs are only as good as the engineers using them, you need to master the tool first before you can be productive with it.

By ewoodrich 2026-01-0717:58

I’m well aware, as I said I am regularly using CC/Codex/OC in a variety of projects, and I certainly didn’t claim that can’t be used productively in a large code base.

But that different challenges become apparent that aren’t addressed by examples like this article which tend to focus on narrow, greenfield applications that can be readily rebuilt in one shot.

I already get plenty of value in small side projects that Claude can create in minutes. And while extremely cool, these examples aren’t the kind of “step change” improvement I’d like to see in the area where agentic tools are currently weakest in my daily usage.

By gf000 2026-01-077:051 reply

I would be much more impressed with implementing new, long-requested features into existing software (that are open to later maintain LLM-generated code).

By ewoodrich 2026-01-0717:24

Fully agreed! That’s the exact kind of thing I was hoping to find when I read the article title, but unfortunately it was really just another “normal AI agent experience” I’ve seen (and built) many examples of before.

By rectang 2026-01-071:41

Adding capacity to software engineering through LLMs is like adding lanes to a highway — all the new capacity will be utilized.

By getting the LLM to keep changes minimal I’m able to keep quality high while increasing velocity to the point where productivity is limited by my review bandwidth.

I do not fear competition from junior engineers or non-technical people wielding poorly-guided LLMs for sustained development. Nor for prototyping or one offs, for that matter — I’m confident about knowing what to ask for from the LLM and how to ask.

By baq 2026-01-076:29

This is relatively easily fixed with increasing test coverage to near 100% and lifting critical components into model checker space; both approaches were prohibitively expensive before November. They’ll be accepted best practices by the summer.

By multisport 2026-01-0716:12

No that has certainly been my experience, but what is going to be the forcing function after a company decides it needs less engineers to go back to hiring?

By tjr 2026-01-071:051 reply

Why not have the LLM rewrite the entire codebase?

By rcoder 2026-01-071:163 reply

In ~25 years or so of dealing with large, existing codebases, I've seen time and time again that there's a ton of business value and domain knowledge locked up inside all of that "messy" code. Weird edge cases that weren't well covered in the design, defensive checks and data validations, bolted-on extensions and integrations, etc., etc.

"Just rewrite it" is usually -- not always, but _usually_ -- a sure path to a long, painful migration that usually ends up not quite reproducing the old features/capabilities and adding new bugs and edge cases along the way.

By rectang 2026-01-071:221 reply

Classic Joel Spolsky:

https://www.joelonsoftware.com/2000/04/06/things-you-should-...

> the single worst strategic mistake that any software company can make:

> rewrite the code from scratch.

By nl 2026-01-072:301 reply

Steve Yegge talks about this exact post a lot - how it stayed correct advice for over 25 years - up until October 2025.

By rectang 2026-01-073:351 reply

Time will tell. I’d bet on Spolsky, because of Hyrum’s Law.

https://www.hyrumslaw.com/

> With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

An LLM rewriting a codebase from scratch is only as good as the spec. If “all observable behaviors” are fair game, the LLM is not going to know which of those behaviors are important.

Furthermore, Spolsky talks about how to do incremental rewrites of legacy code in his post. I’ve done many of these and I expect LLMs will make the next one much easier.

By nojito 2026-01-074:021 reply

>An LLM rewriting a codebase from scratch is only as good as the spec. If “all observable behaviors” are fair game, the LLM is not going to know which of those behaviors are important.

I've been using LLMs to write docs and specs and they are very very good at it.

By rectang 2026-01-074:27

That’s a fair point — I agree that LLMs do a good job predicting the documentation that might accompany some code. I feel relieved when I can rely on the LLM to write docs that I only need to edit and review.

But I’m using LLMs regularly and I feel pretty effectively — including Opus 4.5 — and these “they can rewrite your entire codebase” assertions just seem crazy incongruous with my lived experience guiding LLMs to write even individual features bug-free.

By what-the-grump 2026-01-072:241 reply

When an LLM can rewrite it in 24 hours and fill the missing parts in minutes that argument is hard to defend.

I can vibe code what a dev shop would charge 500k to build and I can solo it in 1-2 weeks. This is the reality today. The code will pass quality checks, the code doesn’t need to be perfect, it doesn’t need to be cleaver it needs to be.

It’s not difficult to see this right? If an LLM can write English it can write Chinese or python.

Then it can run itself, review itself and fix itself.

The cat is out of bag, what it will do to the economy… I don’t see anything positive for regular people. Write some code has turned into prompt some LLM. My phone can outplay the best chess player in the world, are you telling me you think that whatever unbound model anthropic has sitting in their data center can’t out code you?

By gf000 2026-01-077:071 reply

Well, where is your competitor to mainstream software products?

By what-the-grump 2026-01-0722:12

What mainstream software product do I use on a day to day basis besides Claude?

The ones that continue to survive all build around a platform of services, MSO, Adobe, etc.

Most enterprise product offerings, platform solutions, proprietary data access, proprietary / well accepted implementation. But lets not confuse it with the ability to clone it, it doesnt seem far fetched to get 10 people together and vibe out a full slack replacement in a few weeks.

By tjr 2026-01-071:383 reply

If the LLM just wrote the whole thing last week, surely it can write it again.

By tavavex 2026-01-072:20

If an LLM wrote the whole project last week and it already requires a full rewrite, what makes you think that the quality of that rewrite will be significantly higher, and that it will address all of the issues? Sure, it's all probabilistic so there's probably a nonzero chance for it to stumble into something where all the moving parts are moving correctly, but to me it feels like with our current tech, these odds continue shrinking as you toss on more requirements and features, like any mature project. It's like really early LLMs where if they just couldn't parse what you wanted, past a certain point you could've regenerated the output a million times and nothing would change.

By unloader6118 2026-01-072:06

* With a slightly different set of assumption, which may or may not matter. UAT is cheap.

And data migration is lossy, becsuse nobody care the data fidelity anyway.

By grugagag 2026-01-072:04

Broken though

By whatever1 2026-01-070:453 reply

The whole point of good engineering was not about just hitting the hard specs, but also have extendable, readable, maintainable code.

But if today it’s so cheap to generate new code that meets updated specs, why care about the quality of the code itself?

Maybe the engineering work today is to review specs and tests and let LLMs do whatever behind the scenes to hit the specs. If the specs change, just start from scratch.

By majormajor 2026-01-071:43

"Write the specs and let the outsourced labor hit them" is not a new tale.

Let's assume the LLM agents can write tests for, and hit, specs better and cheaper than the outsourced offshore teams could.

So let's assume now you can have a working product that hits your spec without understanding the code. How many bugs and security vulnerabilities have slipped through "well tested" code because of edge cases of certain input/state combinations? Ok, throw an LLM at the codebase to scan for vulnerabilities; ok, throw another one at it to ensure no nasty side effects of the changes that one made; ok, add some functionality and a new set of tests and let it churn through a bunch of gross code changes needed to bolt that functionality into the pile of spaghetti...

How long do you want your critical business logic relying on not-understood code with "100% coverage" (of lines of code and spec'd features) but super-low coverage of actual possible combinations of input+machine+system state? How big can that codebase get before "rewrite the entire world to pass all the existing specs and tests" starts getting very very very slow?

We've learned MANY hard lessons about security, extensibility, and maintainability of multi-million-LOC-or-larger long-lived business systems and those don't go away just because you're no longer reading the code that's making you the money. They might even get more urgent. Is there perhaps a reason Google and Amazon didn't just hire 10x the number of people at 1/10th the salary to replace the vast majority of their engineering teams year ago?

By andrekandre 2026-01-071:243 reply

  > let LLMs do whatever behind the scenes to hit the specs

assuming for the sake of argument that's completely true, then what happens to "competitive advantage" in this scenario?

it gets me thinking: if anyone can vibe from spec, whats stopping company a (or even user a) from telling an llm agent "duplicate every aspect of this service in python and deploy it to my aws account xyz"...

in that scenario, why even have companies?

By mskogly 2026-01-077:06

It’s all fun and games vibecoding until you A) have customers who depend on your product B) it breaks or the one person prompting and has access to the servers and api keys gets incapacited (or just bored).

Sure we can vibecode oneoff projects that does something useful (my fav is browser extensions) but as soon as we ask others to use our code on a regular basis the technical debt clock starts running. And we all know how fast dependencies in a project breaks.

By nl 2026-01-072:361 reply

You can do this for many things now.

Walmart, McDonalds, Nike - none really have any secrets about what they do. There is nothing stopping someone from copying them - except that businesses are big, unwieldy things.

When software becomes cheap companies compete on their support. We see this for Open Source software now.

By gf000 2026-01-077:112 reply

These are businesses with extra-large capital requirements. You ain't replicating them, because you don't have the money, and they can easily strangle you with their money as you start out.

Software is different, you need very very little to start, historically just your own skills and time. Thes latter two may see some changes with LLMs.

By TeodorDyakov 2026-01-0711:271 reply

How conveniently you forgot about the most impotant things for a product to make money - marketing and the network effect....

By gf000 2026-01-0712:01

I don't see the relevance to the discussion. Marketing is not significantly different for a shop and a online-only business.

Having to buy a large property, fulfilling every law, etc is materially different than buying a laptop and renting a cloud instance. Almost everyone has the material capacity to do the latter, but almost no one has the privilege for the former.

By nl 2026-01-089:05

This is exactly my point.

By whatever1 2026-01-071:342 reply

The business is identifying the correct specs and filter the customer needs/requests so that the product does not become irrelevant.

By ehnto 2026-01-071:47

Okay, we will copy that version of the product too.

There is more to it than the code and software provided in most cases I feel.

By majormajor 2026-01-071:521 reply

I think `andrekandre is right in this hypothetical.

Who'd pay for brand new Photoshop with a couple new features and improvements if LLM-cloned Photoshop-from-three-months-ago is free?

The first few iterations of this cloud be massively consumer friendly for anything without serious cloud infra costs. Cheap clones all around. Like generic drugs but without the cartel-like control of manufacturing.

Business after that would be dramatically different, though. Differentiating yourself from the willing-to-do-it-for-near-zero-margin competitors to produce something new to bring in money starts to get very hard. Can you provide better customer support? That could be hard, everyone's gonna have a pretty high baseline LLM-support-agent already... and hiring real people instead could dramatically increase the price difference you're trying to justify... Similarly for marketing or outreach etc; how are you going to cut through the AI-agent-generated copycat spam that's gonna be pounding everyone when everyone and their dog has a clone of popular software and services?

Photoshop type things are probably a really good candidate for disruption like that because to a large extent every feature is independent. The noise reduction tool doesn't need API or SDK deps on the layer-opacity tool, for instance. If all your features are LLM balls of shit that doesn't necessarily reduce your ability to add new ones next to them, unlike in a more relational-database-based web app with cross-table/model dependencies, etc.

And in this "try out any new idea cheaply and throw crap against the wall and see what sticks" world "product managers" and "idea people" etc are all pretty fucked. Some of the infinite monkeys are going to periodically hit to gain temporary advantage, but good luck finding someone to pay you to be a "product visionary" in a world where any feature can be rolled out and tested in the market by a random dev in hours or days.

By fragmede 2026-01-074:12

OK, so what do people do? What do people need? People still need to eat, people get married and die, and all of the things surrounding that, all sorts of health related stuff. Nightlife events. Insurance. actuaries. Raising babies. What do you spend your fun money on?

People pay for things they use. If bespoke software is a thing you pick up at the mall at a kiosk next to Target we gotta figure something out.

By PunchyHamster 2026-01-0713:41

It's all fine till money starts being involved and whoopsies cost more than few hours of fixing.

By sksishbs 2026-01-071:24

[dead]

By qingcharles 2026-01-077:221 reply

I had Opus write a whole app for me in 30 seconds the other night. I use a very extensive AGENTS.md to guide AI in how I like my code chiseled. I've been happily running the app without looking at a line of it, but I was discussing the app with someone today, so I popped the code open to see what it looked like. Perfect. 10/10 in every way. I would not have written it that good. It came up with at least one idea I would not have thought of.

I'm very lucky that I rarely have to deal with other devs and I'm writing a lot of code from scratch using whatever is the latest version of the frameworks. I understand that gives me a lot of privileges others don't have.

By lomase 2026-01-079:252 reply

Can you show us that amazing 10/10 app?

By qingcharles 2026-01-0716:471 reply

It's a not very exciting C# command-line app that takes a PDF and emits it as a sprite sheet with a text file of all the pixel positions of each page :)

By lomase 2026-01-0910:121 reply

[flagged]

By philipodonnell 2026-01-0814:00

You should just need the AGENTS.md right?

By coldtea 2026-01-077:52

>What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects

They get those ocassionally all the time though too. Depends on the company. In some software houses it's constant "greenfield projects", one after another. And even in companies with 1-2 pieces of main established software to maintain, there are all kinds of smaller utilities or pipelines needed.

>But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right".

In some cases that's legit. In other cases it's just "it did it well, but not how I'd done it", which is often needless stickness to some particular style (often a contention between 2 human programmers too).

Basically, what FloorEgg says in this thread: "There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things."

And you can always not just tell it "build me this feature", but tell it (high level way) how to do it, and give it a generic context about such preferences too.

By coryrc 2026-01-074:11

> its building it the right way, in an easily understood way, in a way that's easily extensible.

When I worked at Google, people rarely got promoted for doing that. They got promoted for delivering features or sometimes from rescuing a failing project because everyone was doing the former until promotion velocity dropped and your good people left to other projects not yet bogged down too far.

By lallysingh 2026-01-0710:45

Yeah. Just like another engineer. When you tell another engineer to build you a feature, it's improbable they'll do it they way that you consider "right."

This sounds a lot like the old arguments around using compilers vs hand-writing asm. But now you can tell the LLM how you want to implement the changes you want. This will become more and more relevant as we try to maintain the code it generates.

But, for right now, another thing Claude's great at is answering questions about the codebase. It'll do the analysis and bring up reports for you. You can use that information to guide the instructions for changes, or just to help you be more productive.

By patates 2026-01-077:31

You can look at my comment history to see the evidence to how hostile I was to agentic coding. Opus 4.5 completely changed my opinion.

This thing jumped into a giant JSF (yes, JSF) codebase and started fixing things with nearly zero guidance.

By EthanHeilman 2026-01-0715:28

Even if you are going green field, you need to build it the way it is likely to be used based a having a deep familiarity with what that customer's problems are and how their current workflow is done. As much as we imagine everything is on the internet, a bunch of this stuff is not documented anywhere. An LLM could ask the customer requirement questions but that familiarity is often needed to know the right questions to ask. It is hard to bootstrap.

Even if it could build the perfect greenfield app, as it updates the app it is needs to consider backwards compatibility and breaking changes. LLMs seem very far as growing apps. I think this is because LLMs are trained on the final outcome of the engineering process, but not on the incremental sub-commit work of first getting a faked out outline of the code running and then slowly building up that code until you have something that works.

This isn't to say that LLMs or other AI approaches couldn't replace software engineering some day, but they clear aren't good enough yet and the training sets they have currently have access to are unlikely to provide the needed examples.

By qwm 2026-01-072:181 reply

My favorite benchmark for LLMs and agents is to have it port a medium-complexity library to another programming language. If it can do that well, it's pretty capable of doing real tasks. So far, I always have to spend a lot of time fixing errors. There are also often deep issues that aren't obvious until you start using it.

By Rastonbury 2026-01-072:29

Comments on here often criticise ports as easy for LLMs to do because there's a lot of training and tests are all there, which is not as complex as real word tasks

By ivanech 2026-01-071:162 reply

I find Opus 4.5 very, very strong at matching the prevailing conventions/idioms/abstractions in a large, established codebase. But I guess I'm quite sensitive to this kind of thing so I explicitly ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.

By falkensmaize 2026-01-072:39

I don’t know what I’m doing wrong. Today I tried to get it to upgrade Nx, yarn and some resolutions in a typescript monorepo with about 20 apps at work (Opus 4.5 through Kiro) and it just…couldn’t do it. It hit some snags with some of the configuration changes required by the upgrade and resorted to trying to make unwanted changes to get it to build correctly. I would have thought that’s something it could hit out of the park. I finally gave up and just looked at the docs and some stack overflow and fixed it myself. I had to correct it a few times about correct config params too. It kept imagining config options that weren’t valid.

By tac19 2026-01-075:081 reply

> ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.

People keep telling me that an LLM is not intelligence, it's simply spitting out statistically relevant tokens. But surely it takes intelligence to understand (and actually execute!) the request to "read adjacent code".

By latentsea 2026-01-075:322 reply

I used to agree with this stance, but lately I'm more in the "LLMs are just fancy autocomplete" camp. They can just autocomplete increasingly more things, and when they can't, they fail in ways that an intelligent being just wouldn't. Rather that just output a wrong or useless autocompletion.

By tac19 2026-01-075:44

They're not an equivalent intelligence as human's and thus have noticeably different failure modes. But human's fail in ways that they don't (eg. being unable to match llm's breadth and depth of knowledge)

But the question i'm really asking is... isn't it more than a sheer statistical "trick" if an LLM can actually be instructed to "read surrounding code", understand the request, and demonstrably include it in its operation? You can't do that unless you actually understand what "surrounding code" is, and more importantly have a way to comply with the request...

By baq 2026-01-076:352 reply

In a sense humans are fancy autocomplete, too.

By suddenlybananas 2026-01-078:14

You know that language had to emerge at some point? LLMs can only do anything because they have been fed on human data. Humans actually had to collectively come up with languages /without/ anything to copy since there was a time before language.

By latentsea 2026-01-078:05

I actually don't disagree with this sentiment. The difference is we've optimised for autocompleting our way out of situations we currently don't have enough information to solve, and LLMs have gone the opposite direction of over-indexing on too much "autocomplete the thing based on current knowledge".

At this point I don't doubt that whatever human intelligence is, it's a computable function.

By colechristensen 2026-01-074:13

>day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right"

Then don't ask it to "build me this feature" instead lay out a software development process with designated human in the loop where you want it and guard rails to keep it on track. Create a code review agent to look for and reject strange abstractions. Tell it what you don't like and it's really good at finding it.

I find Opus 4.5, properly prompted, to be significantly better at reviewing code than writing it, but you can just put it in a loop until the code it writes matches the review.

By Madmallard 2026-01-072:29

Based on my experience using these LLMs regularly I strongly doubt it could even build any application with realistic complexity without screwing things up in major ways everywhere, and even on top of that still not meeting all the requirements.

By michael_forrest 2026-01-0914:11

This! I can count on one hand the number of times I've had a chance to spin up a greenfield project, prototype or proof of concept in my 30 year career. Those were always stolen moments, and the bottleneck was never really coding ability. Most professional software development is wading through janky codebases of others' (or your own) creation, trying to iron out weird little glitches of the kind that LLMs can now generate on an industrial scale (and are incapable of fixing).

By miki123211 2026-01-0713:50

In my personal experience, Claude is better at greenfield, Codex is better at fitting in. Claude is the perfect tool for a "vibe coder", Codex is for the serious engineer who wants to get great and real work done.

Codex will regularly give me 1000+ line diffs where all my comments (I review every single line of what agents write) are basically nitpicks. "Make this shallow w/ early return, use | None instead of Optional", that sort of thing.

I do prompt it in detail though. It feels like I'm the person coming in with the architecture most of the time, AI "draws the rest of the owl."

By Balinares 2026-01-0710:26

Exactly. The main issue IMO is that "software that seems to work" and "software that works" can be very hard to tell apart without validating the code, yet these are drastically different in terms of long-term outcomes. Especially when there's a lot of money, or even lives, riding on these outcomes. Just because LLMs can write software to run the Therac-25 doesn't mean it's acceptable for them to do so.

Your hobby project, though, knock yourself out.

By avereveard 2026-01-076:431 reply

But... you can ask! Ask claude to use encapsulation, or to write the equivalent of interfaces in the language you using, and to map out dependencies and duplicate features, or to maintain a dictionary of component responsibilities.

AI coding is a multiplier of writing speed but doesn't excuse planning out and mapping out features.

You can have reasonably engineered code if you get models to stick to well designed modules but you need to tell them.

By verall 2026-01-076:49

But time I spend asking is time I could have been writing exactly what I wanted in the first place, if I already did the planning to understand what I wanted. Once I know what I want, it doesn't take that long, usually.

Which is why it's so great for prototyping, because it can create something during the planning, when you haven't planned out quite what you want yet.

By AndrewKemendo 2026-01-072:151 reply

> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible.

The number of production applications that achieve this rounds to zero

I’ve probably managed 300 brownfield web, mobile, edge, datacenter, data processing and ML applications/products across DoD, B2B, consumer and literally zero of them were built in this way

By kaashif 2026-01-073:282 reply

I think there is a subjective difference. When a human builds dogshit at least you know they put some effort and the hours in.

When I'm reading piles of LLM slop, I know that just reading it is already more effort than it took to write. It feels like I'm being played.

This is entirely subjective and emotional. But when someone writes something with an LLM in 5 seconds and asks me to spend hours reviewing...fuck off.

By parpfish 2026-01-074:061 reply

If you are heavily using LLMs, you need to change the way you think about reviews

I think most people now approach it as: Dev0 uses an LLM to build a feature super fast, Dev1 spends time doing a in depth review.

Dev0 built it, Dev1 reviewed it. And Dev0 is happy because they used the tool to save time!

But what should happen is that Dev0 should take all that time they saved coding and reallocate it to the in depth review.

The LLM wrote it, Dev0 reviewed it, Dev1 double-reviewed it. Time savings are much less, but there’s less context switching between being a coder and a reviewer. We are all reviewers now all the time

By PunchyHamster 2026-01-0713:42

Can't do that, else KPIs won't show that AI tools reduced amount of coding work by xx%

By AndrewKemendo 2026-01-0715:16

Your comment doesn’t address what I said and instead finds a new reason that it’s invalid because “reviewing code from a machine system is beneath me”

Get over yourself

By KentLatricia 2026-01-0713:27

Another thing these posts assume is a single developer keep working on the product with a number of AI agents, not a large team. I think we need to rethink how teams work with AI. Its probably not gonna be a single developer typing a prompt but a team somehow collaborates a prompt or equivalent. XP on steroids? Programming by committee?

By noodletheworld 2026-01-074:32

It might scale.

So far, Im not convinced, but lets take a look at fundmentally whats happening and why humans > agents > LLMs.

At its heart, programming is a constraint satisfaction problem.

The more constraints (requirements, syntax, standards, etc) you have, the harder it is to solve them all simultaneously.

New projects with few contributors have fewer constraints.

The process of “any change” is therefore simpler.

Now, undeniably

1) agents have improved the ability to solve constraints by iterating; eg. Generate, test, modify, etc. over raw LLm output.

2) There is an upper bound (context size, model capability) to solve simultaneous constraints.

3) Most people have a better ability to do this than agents (including claude code using opus 4.5).

So, if youre seeing good results from agents, you probably have a smaller set of constraints than other people.

Similarly, if youre getting bad results, you can probably improve them by relaxing some of the constraints (consistent ui, number of contributors, requirements, standards, security requirements, split code into well defined packages).

This will make both agents and humans more productive.

The open question is: will models continue to improve enough to approach or exceed human level ability in this?

Are humans willing to relax the constraints enough for it to be plausible?

I would say currently people clambering about the end of human developers are cluelessly deceived by the “appearance of complexity” which does not match the “reality of constraints” in larger applications.

Opus 4.5 cannot do the work of a human on code bases Ive worked on. Hell, talented humans struggle to work on some of them.

…but that doesnt mean it doesnt work.

Just that, right now, the constraint set it can solve is not large enough to be useful in those situations.

…and increasingly we see low quality software where people care only about speed of delivery; again, lowering the bar in terms of requirements.

So… you know. Watch this space. Im not counting on having a dev job in 10 years. If I do, it might be making a pile of barely working garbage.

…but I have one now, and anyone who thinks that this year people will be largely replaced by AI is probably poorly informed and has misunderstood the capabilities on these models.

Theres only so low you can go in terms of quality.

By nialse 2026-01-0719:26

After recently applying Codex to a gigantic old and hairy project that is as far from greenfield it can be, I can assure you this assertion is false. It’s bonkers seeing 5.2 churn though the complexity and understanding dependencies that would take me days or weeks to wrap my head around.

By herpdyderp 2026-01-0714:12

On the contrary, Opus 4.5 is the best agent I’ve ever used for making cohesive changes across many files in a large, existing codebase. It maintains our patterns and looks like all the other code. Sometimes it hiccups for sure.

By scotty79 2026-01-0710:22

If you have microservices architecture in your project you are set for AI. You can swap out any lacking, legacy microservice in your system with "greenfield" vibecoded one.

By Havoc 2026-01-072:301 reply

> greenfield

LLMs are pretty good at picking up existing codebases. Even with cleared context they can do „look at this codebase and this spec doc that created it. I want to add feature x“

By le-mark 2026-01-072:361 reply

What size of code base are you talking about? And this is your personal experience?

By Havoc 2026-01-073:093 reply

Overall Codebase size vs context matter less when you set it up as microservices style architecture from the starts.

I just split it into boundaries that make sense to me. Get the LLM to make a quick cheat sheet about the api and then feed that into adjacent modules. It doesn’t need to know everything about all of it to make changes if you’ve got a grip on big picture and the boundaries are somewhat sane

By onion2k 2026-01-075:581 reply

Overall Codebase size vs context matter less when you set it up as microservices style architecture from the starts.

It'll be fun if the primary benefit of microservices turns out to be that LLMs can understand the codebase.

By baq 2026-01-076:361 reply

That was the whole point for humans, too.

By gf000 2026-01-079:12

Except it doesn't work the same way it won't work for LLMs.

If you use too many microserviced, you will get global state, race conditions, much more complex failure models again and no human/LLM can effectively reason about those. We somewhat have tools to do that in case of monoliths, but if one gets to this point with microservices, it's game over.

By magicalist 2026-01-074:022 reply

So "pretty good at picking up existing codebases" so long as the existing codebase is all microservices.

By enraged_camel 2026-01-077:171 reply

I work with multiple monoliths that span anywhere from 100k to 500k lines of code, in a non-mainstream language (Elixir). Opus 4.5 crushes everything I throw at it: complex bugs, extending existing features, adding new features in a way that matches conventions, refactors, migrations... The only time it struggles is if my instructions are unclear or incomplete. For example if I ask it to fix a bug but don't specify that such-and-such should continue to work the way it does due to an undocumented business requirement, Opus might mess that up. But I consider that normal because a human developer would also do fail at it.

By aprilthird2021 2026-01-079:23

With all due respect those are very small codebases compared to the kinds of things a lot of software engineers work on.

By heartbreak 2026-01-074:36

Or a Rails app.

By phito 2026-01-076:42

It doesn't have to be micro services, just code that is decoupled properly, so it can search and build its context easily.

By volkanvardar 2026-01-075:36

I totally agree. And welcome to disposable software age.

By fooker 2026-01-075:20

It just one shots bug fixes in complex codebases.

Copy-paste the bug report and watch it go.

By epolanski 2026-01-0711:40

Yeah, all of those applications he shows do not really expose any complex business logic.

With all the due respect: a file converter for windows is glueing few windows APIs with the relevant codec.

Now, good luck working on a complex warehouse management application where you need extremely complex logic to sort the order of picking, assembling, packing on an infinite number of variables: weight, amazon prime priority, distribution centers, number and type of carts available, number and type of assembly stations available, different delivery systems and requirements for different delivery operators (such as GLE, DHL, etc) that has to work with N customers all requiring slightly different capabilities and flows, all having different printers and operations, etc, etc. And I ain't even scratching the surface of the business logic complexity (not even mentioning functional requirements) to avoid boring the reader.

Mind you, AI is still tremendously useful in the analysis phase, and can sort of help in some steps of the implementation one, but the number of times you can avoid looking thoroughly at the code for any minor issue or discrepancy is absolutely close to 0.

By wilg 2026-01-071:071 reply

you can definitely just tell it what abstractions you want when adding a feature and do incremental work on existing codebase. but i generally prefer gpt-5.2

By boppo1 2026-01-073:581 reply

I've been using 5.2 a lot lately but hit my quota for the first time (and will probably continue to hit it most weeks) so I shelled out for claude code. What differences do you notice? Any 'metagame' that would be helpful?

By wilg 2026-01-075:40

I just use Cursor because I can pick any mode. The difference is hard to say exactly, Opus seems good but 5.2 seems smarter on the tasks I tried. Or possibly I just "trust" it more. I tend to use high or extra high reasoning.

By kevinsync 2026-01-074:56

Man, I've been biting my tongue all day with regards to this thread and overall discussion.

I've been building a somewhat-novel, complex, greenfield desktop app for 6 months now, conceived and architected by a human (me), visually designed by a human (me), implementation heavily leaning on mostly Claude Code but with Codex and Gemini thrown in the mix for the grunt work. I have decades of experience, could have built it bespoke in like 1-2 years probably, but I wanted a real project to kick the tires on "the future of our profession".

TL;DR I started with 100% vibe code simply to test the limits of what was being promised. It was a functional toy that had a lot of problems. I started over and tried a CLI version. It needed a therapist. I started over and went back to visual UI. It worked but was too constrained. I started over again. After about 10 complete start-overs in blank folders, I had a better vision of what I wanted to make, and how to achieve it. Since then, I've been working day after day, screen after screen, building, refactoring, going feature by feature, bug after bug, exactly how I would if I was coding manually. Many times I've reached a point where it feels "feature complete", until I throw a bigger dataset at it, which brings it to its knees. Time to re-architect, re-think memory and storage and algorithms and libraries used. Code bloated, and I put it on a diet until it was trim and svelte. I've tried many different approaches to hard problems, some of which LLMs would suggest that truly surprised me in their efficacy, but only after I presented the issues with the previous implementation. There's a lot of conversation and back and forth with the machine, but we always end up getting there in the end. Opus 4.5 has been significantly better than previous Anthropic models. As I hit milestones, I manually audit code, rewrite things, reformat things, generally polish the turd.

I tell this story only because I'm 95% there to a real, legitimate product, with 90% of the way to go still. It's been half a year.

Vibe coding a simple app that you just want to use personally is cool; let the machine do it all, don't worry about under the hood, and I think a lot of people will be doing that kind of stuff more and more because it's so empowering and immediate.

Using these tools is also neat and amazing because they're a force multiplier for a single person or small group who really understand what needs done and what decisions need made.

These tools can build very complex, maintainable software if you can walk with them step by step and articulate the guidelines and guardrails, testing every feature, pushing back when it gets it wrong, growing with the codebase, getting in there manually whenever and wherever needed.

These tools CANNOT one-shot truly new stuff, but they can be slowly cajoled and massaged into eventually getting you to where you want to go; like, hard things are hard, and things that take time don't get done for a while. I have no moral compunctions or philosophical musings about utilizing these tools, but IMO there's still significant effort and coordination needed to make something really great using them (and literally minimal effort and no coordination needed to make something passable)

If you're solo, know what you want, and know what you're doing, I believe you might see 2x, 4x gains in time and efficiency using Claude Code and all of his magical agents, but if your project is more than a toy, I would bet that 2x or 4x is applied to a temporal period of years, not days or months!

By blitz_skull 2026-01-0714:02

[flagged]

By llm_nerd 2026-01-071:522 reply

"its building it the right way, in an easily understood way, in a way that's easily extensible"

I am in a unique situation where I work with a variety of codebases over the week. I have had no problem at all utilizing Claude Code w/ Opus 4.5 and Gemini CLI w/ Gemini 3.0 Pro to make excellent code that is indisputably "the right way", in an extremely clear and understandable way, and that is maximally extensible. None of them are greenfield projects.

I feel like this is a bit of je ne sais quoi where people appeal to some indemonstrable essence that these tools just can't accomplish, and only the "non-technical" people are foolish enough to not realize it. I'm a pretty technical person (about 30 years of software development, up to staff engineer and then VP). I think they have reached a pretty high level of competence. I still audit the code and monitor their creations, but I don't think they're the oft claimed "junior developer" replacement, but instead do the work I would have gotten from a very experienced, expert-level developer, but instead of being an expert at a niche, they're experts at almost every niche.

Are they perfect? Far from it. It still requires a practitioner who knows what they're doing. But frequently on here I see people giving takes that sound like they last used some early variant of Copilot or something and think that remains state of the art. The rest of us are just accelerating our lives with these tools, knowing that pretending they suck online won't slow their ascent an iota.

By what 2026-01-072:522 reply

>llm_nerd >created two years ago

You AI hype thots/bots are all the same. All these claims but never backed up with anything to look at. And also alway claiming “you’re holding it wrong”.

By pigpop 2026-01-073:09

I don't see how "two years ago" is incongruous with having been using LLMs for coding, it's exactly the timeline I would expect. Yes, some people do just post "git gud" but there are many people ITT and most of the others on LLM coding articles who are trying to explain their process to anyone who will listen. I'm not sure if it is fully explainable in a single comment though, I'd have to write a multi-part tutorial to cover everything but it's almost entirely just applying the same project management principles that you would in a larger team of developers but customized to the current limitations of LLMs. If you want full tutorials with examples I'm sure they're out there but I'd also just recommend reviewing some project management material and then seeing how you can apply it to a coding agent. You'll only really learn by doing.

By llm_nerd 2026-01-0712:09

>You AI hype thots/bots are all the same

This isn't twitter, so save the garbage rhetoric. And if you must question my account, I create a new account whenever I setup a new main PC, and randomly pick a username that is top of mind at the moment. This isn't professionally or personally affiliated in any way so I'm not trying to build a thing. I mean, if I had a 10 year old account that only managed a few hundred upvotes despite prolific commenting, I'd probably delete it out of embarrassment though.

>All these claims but never backed up with anything to look at

Uh...install the tools? Use them? What does "to look at" even mean? Loads of people are using these tools to great effect, while some tiny minority tell us online that no way they don't work, etc. And at some point they'll pull their head out of the sand and write the followup "Wait, they actually do".

By doxeddaily 2026-01-075:051 reply

I also have >30 years and I've had the same experience. I noticed an immediate improvement with 4.5 and I've been getting great results in general.

And yes I do make sure it's not generating crazy architecture. It might do that.. if you let it. So don't let it.

By llm_nerd 2026-01-0713:32

HN has a subset of users -- they're a minority, but they hit threads like this super hard -- who really, truly think that if they say that AI tools suck and are only for nubs loud enough and frequently enough, downvoting anyone who finds them useful, all AI advancements will unwind and it'll be the "good old days" again. It's rather bizarre stuff, but that's what happens when people in denial feel threatened.

Opus 4.5 is not the normal AI agent experience that I have had thus far

Show article

BUT YOU DON’T KNOW HOW THE CODE WORKS

Times they are A-changin

tbassetto

Comments

By OldGreenYodaGPT 2026-01-0618:1333 reply

By klaussilveira 2026-01-070:3812 reply

By JDye 2026-01-0715:564 reply

By dpc_01234 2026-01-0718:332 reply

By lisperforlife 2026-01-0720:381 reply

By dpc_01234 2026-01-085:41

By fullstackchris 2026-01-1519:55

By kevin42 2026-01-0718:051 reply

By ryandrake 2026-01-0718:152 reply

By kevin42 2026-01-0718:273 reply

By JDye 2026-01-0912:35

By ryandrake 2026-01-0718:313 reply

By kevin42 2026-01-0718:471 reply

By theshrike79 2026-01-0719:49

By theshrike79 2026-01-0719:48

By gck1 2026-01-0720:35

By gck1 2026-01-0720:31

By HDThoreaun 2026-01-0718:49

By turkey99 2026-01-0720:33

By parliament32 2026-01-0718:115 reply

By woah 2026-01-0719:041 reply

By drysine 2026-01-0719:54

By ryandrake 2026-01-0718:191 reply

By parliament32 2026-01-0718:341 reply

By emodendroket 2026-01-0718:481 reply

By windexh8er 2026-01-084:471 reply

By emodendroket 2026-01-0921:28

By scottyah 2026-01-0720:341 reply

By wolvoleo 2026-01-0816:33

By loandbehold 2026-01-080:27

By 3oil3 2026-01-086:32

By wild_egg 2026-01-070:461 reply

By jaggederest 2026-01-071:021 reply

By jonstewart 2026-01-071:443 reply

By rtfeldman 2026-01-074:054 reply

By mr_o47 2026-01-0718:561 reply

By jaggederest 2026-01-0722:44

By norir 2026-01-0716:302 reply

By rtfeldman 2026-01-0716:31

By kaydub 2026-01-0717:461 reply

By djhn 2026-01-0721:18

By ziml77 2026-01-0722:561 reply

By rtfeldman 2026-01-0918:05

By Snuggly73 2026-01-0715:542 reply

By kevin42 2026-01-0718:111 reply

By dpark 2026-01-0720:52

By rtfeldman 2026-01-0716:391 reply

By Snuggly73 2026-01-0716:491 reply

By rtfeldman 2026-01-0716:56

By jaggederest 2026-01-071:531 reply

By glhaynes 2026-01-0714:49

By andai 2026-01-0717:46

By lelandfe 2026-01-074:282 reply

By baq 2026-01-076:471 reply

By lelandfe 2026-01-0920:25

By subomi 2026-01-075:562 reply

By puttycat 2026-01-0710:261 reply

By Der_Einzige 2026-01-0717:011 reply

By recursive 2026-01-0718:32

By lelandfe 2026-01-076:24

By CapsAdmin 2026-01-079:41

By 3D30497420 2026-01-0714:462 reply

By rootusrootus 2026-01-0718:43

By billbrown 2026-01-0719:571 reply

By bigDinosaur 2026-01-087:20

By 348512469721 2026-01-0819:331 reply

By jessoteric 2026-01-0820:50

By UncleOxidant 2026-01-075:531 reply

By HarHarVeryFunny 2026-01-0715:441 reply

By HarHarVeryFunny 2026-01-0814:261 reply

By mattarm 2026-01-0815:221 reply

By HarHarVeryFunny 2026-01-0816:46

By nycdatasci 2026-01-0718:251 reply

By klaussilveira 2026-01-081:19