The Codex app illustrates the shift left of IDEs and coding GUIs

2026-02-0420:2381202benshoemaker.us

The Codex desktop app doesn't change everything - but it's part of a larger trend worth paying attention to. Where IDEs are headed and why specs matter more than code.

Show article

The Codex desktop app doesn't change everything - but it's part of a larger trend worth paying attention to. Where IDEs are headed and why specs matter more than code.

No, it doesn’t. The Codex desktop app dropped yesterday. You’ll see breathless Twitter posts and YouTube videos about how it changes everything. It doesn’t. But it is pretty cool, and it’s part of a larger trend worth paying attention to. I’m going to talk briefly about how it’s changing my workflow, and then zoom out to what it means that this app exists at all.

My Workflow (For Now)

I’ll write a longer post on this, but the quick version:

My primary driver is Claude Code in the terminal. I think it has the best features, the most hooks, and the most ability to create a clean development workflow with all the checks I want.
The Codex app is my parallelization layer. The thing that’s cool about it (and Conductor, which is similar) is that it makes Git worktrees easy to use. That means real parallelization.

Here’s how I’m experimenting with it:

I have my main feature or project running in Claude Code in a terminal window
Whenever I come up with changes, bug fixes, or investigations outside the scope of that feature, I spin up a worktree in the Codex app. I can chat with it separately. It lets me know when it needs input. It’s totally isolated, and I can merge it back whenever I want.

The TLDR: Codex app is OpenAI’s supported UI for multi-agent parallelized development. In my workflow, I use it to develop small features in parallel while I’m working on the main thing in Claude Code.

The Bigger Picture

The reason I find this interesting isn’t the app itself, but what it says about where things are headed. I think about IDEs a lot because they’re a lens into where software development is going. I’ve said this before: software development will be unrecognizable in two to three years. And what’s happening with IDEs is proof.

“IDE” stands for integrated development environment. The name doesn’t imply it has to be about reading and writing code - but that’s what it’s always been. That’s changing.

Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code. I don’t question the code. I either ask one of my coding agents, or - more often - I ask myself: what happened with my system? What can I improve about the inputs that led to that code being generated?

The code isn’t the thing I’m debugging. The system that produced the code is. The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.

The Continuum

The image above illustrates how I think about this landscape. There’s a spectrum with three major zones: Code, Agents, and Specs. The further left you move, the higher up the stack you get.

Code (right side): Traditional IDEs. VS Code, JetBrains. You read code, you write code.

Code + AI: AI-assisted features. Autocomplete, inline suggestions. GitHub Copilot lives here. The human is still driving.

Agentic IDEs: Cursor, Windsurf. Code and agents combined. The AI makes autonomous multi-file edits, runs terminal commands, iterates on its own work. But you’re still looking at code.

Multi-Agent Orchestration: Claude Code, Codex CLI, Codex app, Conductor. The whole interface is about managing agents. You’re not staring at code - you’re dispatching tasks, watching progress, reviewing PRs. Agent inbox.

Specs (left side): Kiro, GitHub Spec Kit, Vibe Scaffold. The spec is the primary artifact. Requirements → design → tasks → implementation. Code is an output, not the thing you manage.

Where This Is Going

I think the industry is moving left. Toward specs. The code is becoming an implementation detail. What matters is the system that produces it - the requirements, the constraints, the architecture. Get those right, and the code follows. I’m actually building something in this area, focused on specs (not Vibe Scaffold 🙂). Hopefully I have some details in the next few weeks.

Read the original article

Comments

By kace91 2026-02-0421:5919 reply

>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.

I can’t imagine any other example where people voluntarily move for a black box approach.

Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?

By eikenberry 2026-02-0423:173 reply

I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).

By jkhdigital 2026-02-0423:406 reply

Your analogy to PHP developers not reading assembly got me thinking.

Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.

There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

By ytoawwhra92 2026-02-0423:571 reply

For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.

Put another way, if you don't know what correct is before you start working then no tradeoff exists.

By majormajor 2026-02-053:271 reply

> Put another way, if you don't know what correct is before you start working then no tradeoff exists.

This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.

"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.

By ytoawwhra92 2026-02-056:032 reply

> This goes out the window the first time you get real users, though.

Not really. Many users are happy for their software to change if it's a genuine improvement. Some users aren't, but you can always fire them.

Certainly there's a scale beyond which this becomes untenable, but it's far higher than "the first time you get real users".

By majormajor 2026-02-0517:31

But that's not what this is about:

> For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.

Some version of the software exists and now that's your spec. If you don't have a formal copy of that and rigorous testing against that spec, you're gonna get mutations that change unintended things, not just improvements.

Users are generally ok with - or at least understanding of - intentional changes, but now people are talking about no-code-reading workflows, where you just let the agents rewrite stuff on the fly to build new things until all the tests pass again. The in-code tests and the expectations/assumptions about the product that your users have are likely wildly different - they always have been, and there's nothing inherent about LLM-generated code or about code test coverage percentages that change this.

By just6979 2026-02-0517:05

"Some users will _accept_ "improvements" IFF it doesn't break their existing use cases."

Fixed that for you.

By andai 2026-02-052:251 reply

> So the tradeoffs involved are not only about developer time vs. performance, but also correctness.

The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.

But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!

For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.

(In the case of Windows 11 recently.. not so much ;)

By majormajor 2026-02-053:25

> The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.

It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.

In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.

The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)

By bandrami 2026-02-052:11

OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".

By drawnwren 2026-02-051:52

It's also important to remember that vibe coders throw away the natural language spec each time they close the context window.

Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.

By HansHamster 2026-02-050:17

> which is faithfully translated by the (hopefully bug-free) compiler.

"Hey Claude, translate this piece of PHP code into Power10 assembly!"

By QuadmasterXLII 2026-02-050:381 reply

Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.

By 6510 2026-02-054:19

Or the opposite, all applications are just text files with prompts in them and the assembly lives as ravioli in many temp files. It only builds the code that is used. You can extend the prompt while using the application.

By re-thc 2026-02-0423:341 reply

> that is they are describing a new type of programmer that will only use agents and never read the underlying code

> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly

This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.

Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?

By 6510 2026-02-054:20

That is true for all languages. Very high quality until you use a lib, a module or an api.

By straydusk 2026-02-0423:28

I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.

I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.

By csallen 2026-02-0422:135 reply

> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.

The output of code isn't just the code itself, it's the product. The code is a means to an end.

So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.

By kace91 2026-02-0422:241 reply

>The output of code isn't just the code itself, it's the product. The code is a means to an end.

I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?

If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code

By CuriouslyC 2026-02-0422:333 reply

I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.

By kace91 2026-02-0422:451 reply

>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions

So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.

Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.

And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?

By CuriouslyC 2026-02-050:49

There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.

By just6979 2026-02-0517:12

"I push very high test coverage on all my projects (85%+)"

Coverage doesn't matter if the tests aren't good. If you're not verifying the tests are actually doing something useful, talking about high coverage is just wanking.

"have the agent create progressively bigger integration tests, until I hit e2e/manual validation."

Same thing. It doesn't matter how big the tests are if they're not testing the right thing. Also why is e2e slashed with manual? Those are orthogonal. E2E tests can [and should] be fully automated for many [most?] systems. And manual validation doesn't have to wait for full e2e.

By straydusk 2026-02-0422:56

"Testing ladders" is a great framing.

My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.

By straydusk 2026-02-0422:552 reply

Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?

I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.

By nubg 2026-02-053:06

People miss this a lot. Coding is just a (small) part of building a product. You get a much better bang for the buck if you focus your time on talking to the user, dogfooding, and then vibecoding. It also allows you to do many more iterations with even large changes because since your didn't "write" the code, you don't care about throwing it away.

By fastasucan 2026-02-0720:50

> the product work, does it meet the spec, do the tests pass

How is this decoupled from the code?

By add-sub-mul-div 2026-02-0423:04

A photo isn't going to fail next week or three months from now because it's full of bugs no one's triggered yet.

Specious analogies don't help anything.

By alanbernstein 2026-02-0422:18

Right, it seems the appropriate analogy is the shift from analog-photograph-developers to digital camera photographers.

By 6510 2026-02-054:23

The product is: solving a problem. Requirements vary.

By vidarh 2026-02-0511:38

Your product managers most likely are not reading your code. Your CEO is not. The vast majority of your company is unlikely to ever look at a line of code.

If the process becomes reliable enough, then there is no reason. For now, that still requires developers to pay attention for important projects, but there are also a lot of AI written tools I rely on day to day that I don't, because the opportunity cost of spending time reading them is lower than the cost of accepting the risk that they do something wrong.

There are also a whole lot of tools I do read thoroughly, because the risk profile is different.

But that category is getting smaller day by day, not just with model improvements, but with improved harnesses.

By weikju 2026-02-0422:241 reply

Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!

By thefz 2026-02-0422:323 reply

You made me imagine AI companies maliciously injecting backdoors in generated code no one reads, and now I'm scared.

By gibsonsmog 2026-02-0422:54

My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.

By djeastm 2026-02-051:272 reply

One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.

By just6979 2026-02-0517:15

Then how many models deep do you go before it's more cost effective to just hire a junior dev, supply them with a list of common backdoors, and have them scan the code?

By thefz 2026-02-056:351 reply

What about writing the actual code yourself

By just6979 2026-02-0517:17

Nah, more fun to burn money.

By bandrami 2026-02-052:09

Already happening in the wild

By andyferris 2026-02-051:51

The output is the program behavior. You use it, like a user, and give feedback to the coding agent.

If the app is too bright, you tweak the settings and build it again.

Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.

Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.

By CharlesW 2026-02-0422:11

AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.

By Aeolun 2026-02-0422:151 reply

> What is the logic here?

It is right often enough that your time is better spent testing the functionality than the code.

Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).

By kace91 2026-02-0422:30

I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.

By manmal 2026-02-0422:053 reply

> I can’t imagine any other example where people voluntarily move for a black box approach.

Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.

By kace91 2026-02-0422:202 reply

>At some point you have to let go and trust people‘s judgement.

Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.

>Reading and understanding the whole output of 9 concurrently running agents is impossible.

I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?

I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.

By wtetzner 2026-02-052:121 reply

Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?

By kuschku 2026-02-058:56

It is not. To review code you need to have an understanding of the problem that can only be built by writing code. Not necessarily the final product, but at least prototypes and experiments that then inform the final product.

By re-thc 2026-02-0422:131 reply

> Anyone overseeing work from multiple people has to?

That's not a black box though. Someone is still reading the code.

> At some point you have to let go and trust people‘s judgement

Where's the people in this case?

> People who do that (I‘m not one of them btw) must rely on higher level reports.

Does such a thing exist here? Just "done".

By manmal 2026-02-0422:442 reply

> Someone is still reading the code.

But you are not. That’s the point?

> Where's the people in this case?

Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.

> Does such a thing exist here? Just "done".

Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.

By just6979 2026-02-0517:292 reply

You: "Why did you build this?"

LLM: "Because the embeddings in your prompt are close to some embeddings in my training data. Here's some seemingly explanatory text with that is just similar embeddings to other 'why?' questions."

You: "What could be improved?"

LLM: "Here's some different stuff based on other training data with embeddings close to the original embeddings, but different.

---

It's near zero useful information. Example imformation might be "it builds" (baseline necessity, so useless info), "it passes some tests" (fairly baseline, more useful, but actually useless if you don't know what the tests are doing), or "it's different" (duh).

By manmal 2026-02-0520:051 reply

If I asked you, about a piece of code that you didn’t build, „What would you improve?“, how would that be fundamentally different?

By just6979 2026-02-1622:15

I would start with asking what kind of improvment, why, how, etc.

Or I could just start changing things to more closely line up with whatever textbook or "Clean Code" clone book I has last read, and hope it still passes the tests and that those tests are as thorough as possible.

The latter would: eventually get me fired, is stupid, and basically what the LLMs do.

By re-thc 2026-02-0512:321 reply

> You can definitely ask the agent what it built, why it built it, and what could be improved.

If that was true we’d have what they call AGI.

So no, it doesn’t actually give you those since it can’t reason and logic in such a way.

By manmal 2026-02-0520:001 reply

What does that have to do with AGI?

By re-thc 2026-02-0520:43

What you asked for is AGI. How else does it think, reason and logic to answer your "why"?

It doesn't do that currently even if you think it does.

By ink_13 2026-02-0422:261 reply

An AI agent cannot be held accountable

By manmal 2026-02-0422:45

Neither can employees, in many countries.

By Xirdus 2026-02-0422:181 reply

> I can’t imagine any other example where people voluntarily move for a black box approach.

I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.

By ink_13 2026-02-0422:251 reply

So... things where the producer doesn't respect the audience? Because any such analysis would be worth as much as a 4.5 hour atonal bass solo.

By sroerick 2026-02-050:471 reply

You can get an AI to listen to that bass solo for you

By just6979 2026-02-0517:34

But can you get an AI to zone out on a fluffy couch at the center point of a dank hi-fi setup with the volume cranked to 11, while chillin' on 50mg of THC?

And will you enjoy paying someone else to let the AI to do that?

By straydusk 2026-02-0422:46

No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.

To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.

My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.

By AlexCoventry 2026-02-0423:121 reply

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

It's producing seemingly working code faster than you can closely review it.

By kace91 2026-02-0423:19

Your car can also move faster than what you can safely control. Knowing this, why go pedal to the metal?

By laterium 2026-02-060:51

You generate 20k LOC in a few hours. How long will it take you to read it? One week? You just keep going instead. I don't think this works great at large scale production codebases yet, but it's an approach that will have more and more applications going forward. It doesn't have to fit every use case.

By bloomca 2026-02-050:18

I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).

I personally think we are not that far from it, but it will need something built on top of current CLI tools.

By raincole 2026-02-0422:371 reply

> Because if you can read code, I can’t imagine poking the result with black box testing being faster.

I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.

By nubg 2026-02-053:08

Good analogy.

By seanmcdirmid 2026-02-0422:381 reply

> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.

The AI also writes the black box tests, what am I missing here?

By kace91 2026-02-0423:101 reply

>The AI also writes the black box tests, what am I missing here?

If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.

In other words, if “the ai is checking as well” no one is.

By seanmcdirmid 2026-02-0423:26

That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters.

etc...etc...

> In other words, if “the ai is checking as well” no one is.

"I tried nothing, and nothing at all worked!"

By llamajams 2026-02-0512:53

Same. I stopped reading after that. I get the sense that most of these people thing all code is web or mobile or something non critical. Granted im not a web or mobile guy so I cant presume the complexity, risk, cost of such things. But I assume its in a different category than safety/mission critical things. I do dev tools for ASIL-B systems devs now and even then I cant say im comfortable not reading the generated code. Some of my junior peers are though, and im very frustrated that I feel like I keep having to play AI janitor, dont think the bosses care.

By hjoutfbkfd 2026-02-0422:411 reply

your metaphor is wrong.

code is not the output. functionality is the output, and you do look at that.

By kace91 2026-02-0423:13

Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code.

Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.

By ForHackernews 2026-02-0422:222 reply

>Imagine taking a picture on autoshot mode

Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.

There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?

By sigseg1v 2026-02-059:36

Hey it's me! I shoot with manual focus lenses in RAW and drive a standard. There are dozens of us!

By weikju 2026-02-0422:26

You missed out on the rest of the analogy though, which is the part where the photo is not reviewed before handing it over to the client.

By notepad0x90 2026-02-0422:14

people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.

By teecha 2026-02-051:543 reply

I find so many of these comments and debates fascinating as a lay person. I'm more tech savy than mostI meet, built my own PCs, know my way around some more 'advanced' things like terminal a bit and have a deeper understanding of computer systems, software, etc. than most people I know. It has always been more of a hobby for me. People look at me as the 'tech' guy even though I'm actually not.

Something I know very little about is coding. I know there are different languages with pros and cons to each. I know some work across operating systems while others don't but other than that I don't know too much.

For the first time I just started working on my own app in Codex and it feels absolutely amazing and magical. I've not seen the code, would have basically no idea how to read it, but i'm working on a niche application for my job that it is custom tailored to my needs and if it works I'll be thrilled. Even better is that the process of building is just feels so special and awesome.

This really does feel like it is on the precipice of something entirely different. I think back to computers before a GUI interface. I think back to even just computers before mobile touch interfaces. I am sure there are plenty of people who thought some of these things wouldn't work for different reasons but I think that is the wrong idea. The focus should be on who this will work for and why and there, I think, there are a ton of possibilities.

For reference, I'm a middle school Assistant Principal working on an app to help me with student scheduling.

By chasd00 2026-02-053:23

Keep building and keep learning, I think you are the kind of user that stands to benefit the most from this technology.

By panny 2026-02-0511:58

My observation is that "AI" makes easy things easier and hard things impossible. You'll get your niche app out of it, you'll be thrilled, then you'll need it to do more. Then you will struggle to do more, because the AI created a pile of technical debt.

Programmers dream of getting a green field project. They want to "start it the right way this time" instead of being stuck unwinding technical debt on legacy projects. AI creates new legacy projects instantly.

By grigri907 2026-02-055:011 reply

After 10+ years of stewing on an idea, I started building an app (for myself) that I've never had the courage or time to start until now.

I really wanted to learn the coding, the design patterns, etc, but truthfully, it was never gonna happen without a Claude. I could never get past the unknown-unknowns (and I didn't even grasp how broad is the domain of knowledge it actually requires.) Best case I would have started small chunks and abandoned it countless times, piling on defeatism and disappointment each time.

Now in under two weeks of spare time and evenings, I've got a working prototype that's starting to resemble my dream. Does my code smell? Yes. Is it brittle? Almost certainly. Is it a security risk? I hope not. (It's not.)

I want to be intentional about how I use AI; I'm nervous about how it alters how we think and learn. But seeing my little toy out in the real world is flippin incredible.

By thepasch 2026-02-0513:371 reply

> Is it a security risk? I hope not. (It's not.)

It very probably is, but if it's a personal project you're not planning on releasing anywhere, it doesn't matter much.

You should still be very cognizant that LLMs will currently fairly reliably implement massive security risks once a project grows beyond a certain size, though.

By jason_oster 2026-02-0517:23

They can also identify and fix vulnerabilities when prompted. AI is being used heavily by security researchers for this purpose.

It’s really just a case of knowing how to use the tools. Said another way, the risk is being unaware of what the risks are. And awareness can help one get out of the bad habits that create real world issues.

By GalaxyNova 2026-02-0421:4919 reply

> I don’t read code anymore

Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.

By subsection1h 2026-02-0423:542 reply

> Never thought this would be something people actually take seriously

The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev) and only started using GitHub[2] in 2025 when they were laid off[3].

[1] https://www.linkedin.com/in/benshoemaker000/

[2] https://github.com/benjaminshoemaker

[3] https://www.benshoemaker.us/about

By zipy124 2026-02-051:03

Whilst I won't comment on this specific person, one of the best programmers I've met has a law degree, so I wouldn't use their degree against them. People can have many interests and skills.

By straydusk 2026-02-057:45

I've written code since 2012, I just didn't put it online. It was a lot harder, so all my code was written internally, at work.

But sure, go with the ad hominem.

By sho_hn 2026-02-0422:463 reply

> Never thought this would be something people actually take seriously.

You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).

AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.

It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.

I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.

By jofla_net 2026-02-051:14

This is gold.

I have rarely had the words pulled out of my mouth.

The percentage of devs in my career that are from the same academic background, show similar interests, and approach the field in the same way, is probably less than %10, sadly.

By arjie 2026-02-0423:41

Well, there are programmers like Karpathy in his original coinage of vibe coding

> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Notice "don't read the diffs anymore".

In fact, this is practically the anniversary of that tweet: https://x.com/karpathy/status/2019137879310836075?s=20

By famouswaffles 2026-02-0514:222 reply

Ahh Bulverism, with a hint of ad-hominem and a dash no No True Scotsman. I think the most damning indictment here is the seeming inability to make actual arguments and not just cheap shots at people you've never even met.

Please tell me, "Were people excited about high-level languages just programmers who 'couldn't hack it' with assembly? Maybe you are one of those? Were GUI advocates just people who couldn't master the command line?"

By sho_hn 2026-02-0516:28

Thanks for teaching me about Bulverism, I hadn't heard of that fallacy before. I can see how my comment displays those characteristics and will probably try to avoid that pattern more in the future.

Honestly, I still think there's truth to what I wrote, and I don't think your counter-examples prove it wrong per-se. The prompt I responded to ("why are people taking this seriously") also led fairly naturally down the road of examining the reasons. That was of course my choice to do, but it's also just what interested me in the moment.

By BrouteMinou 2026-02-0516:421 reply

I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".

And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.

By famouswaffles 2026-02-0517:561 reply

>I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".

It's the equivalent of saying anyone excited about being able to microwave Frozen meals is a hack who couldn't make it in the kitchen. I'm sorry, but if you don't see how ridiculous that assertion is then I don't know what to tell you.

>And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.

A series of condescending statements meant to demean with no objective backing whatsoever is not an argument. What do you want me to say ? There's nothing worth addressing, other than pointing out how empty it is.

You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?

You and OP have zero actual clue. At any advancement, regardless of how big or consequential, there are always people like that. It's very nice to feel smart and superior and degrade others, but people ought to be better than that.

So I'm sorry but I don't really care how superior a cook you think you are.

By sho_hn 2026-02-0518:501 reply

> You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?

I think both things can be true simultaneously.

You're arguing against a straw man.

By famouswaffles 2026-02-0614:03

Pointing out that your argument relies on an unverifiable (and easily countered) generalization isn't a straw man.

By sixdimensional 2026-02-0422:001 reply

Half serious - but is that really so different than many apps written by humans?

I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.

In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.

In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).

I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.

I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.

By sixdimensional 2026-02-0422:24

I would enjoy discussion with whoever voted this down - why did you?

What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?

By strken 2026-02-0423:59

I'm torn between running away to be an electrician or just waiting three years until everyone realises they need engineers who can still read.

Sometimes it feels like pre-AI education is going to be like low-background steel for skilled employees.

By Aeolun 2026-02-0422:18

> 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.

That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?

By binsquare 2026-02-0421:531 reply

Reading and understanding code is more important than writing imo

By eikenberry 2026-02-0422:412 reply

It’s pretty well established that you cannot understand code without having thought things through while writing it. You need to know why things are written the way the are to understand what is written.

By tomjakubowski 2026-02-0423:06

Yeah, just reading code does little to help me understand how a program works. I have to break it apart and change it and run it. Write some test inputs, run the code under a debugger, and observe the change in behavior when changing inputs.

By fragmede 2026-02-056:291 reply

If that were true, then only the person who wrote the code could ever understand it enough to fix bugs, which is decidedly not true.

By jason_oster 2026-02-0517:45

I’ll grant you that there are many trivial software defects that can be identified by simply reading the code and making minor changes.

But for architectural issues, you need to be able to articulate how you would have written the code in the first place, once you understand the existing behavior and its problems. That is my interpretation of GP’s comment.

By Hamuko 2026-02-0422:12

I've seen software written and architected by Claude and I'd say that they're already ready to be thrown out. Security sucks, performance will probably suck, maintainability definitely sucks, and UX really fucking sucks.

By j_bizzle 2026-02-050:11

The coincidental timing between the rapid increase in the number of emergency fixes coming out on major software platforms and the proud announcement of the amount of code that's being produced by AI at the same companies is remarkable.

I think 2-3 years is generous.

Don't get me wrong, I've definitely found huge productivity increases in using various LLM workflows in both development as well as operational things. But removing a human from the loop entirely at this point feels reckless bordering on negligent.

By straydusk 2026-02-0423:24

I actually think this is fair to wonder about.

My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.

I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?

I feel like time will validate that.

By Computer0 2026-02-0421:57

I have wondered the same but for the projects I am completely "hands off" on, the model improvements have overcome this issue time and time again.

By bloomca 2026-02-050:271 reply

If the models don't get to the point where they can correct fixes on their own, then yeah, everything will be falling apart. There is just no other way around increasing entropy.

The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.

By esperent 2026-02-050:44

> If the models don't get to the point where they can correct fixes on their own

Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.

(+) Except for making the UI beautiful. It's crap at that.

The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.

This extends up to higher level components where the spec also briefly explains the actual goal.

Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.

My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.

By rustyhancock 2026-02-0422:143 reply

I'm 2-3 years from now if coding AI continues to improve at this pace I reckon people will rewrite entire projects.

I can't imagine not reading the code I'm responsible for any more than I could imagine not looking out the windscreen in a self driving Tesla.

But if so many people are already there, and mostly highly skilled programmers imagine in 2 years time with people who've never programmed!

By nullsanity 2026-02-0422:23

If I keep getting married at the same pace I have, then in a few years I'll have like 50 husbands.

By weakfish 2026-02-0422:22

Well, Tesla has been nearly at FSD for how long? The analogy you make sorta makes it sound less likely

By GalaxyNova 2026-02-0422:421 reply

Seems dangerous to wager your entire application on such an uncertainty

By geraneum 2026-02-050:07

Some people are not aware that they are one race condition away from a class action lawsuit.

By abrookewood 2026-02-053:24

The proponents of Spec Driven Development argue that throwing everything out completely and rebuilding from scratch is "totally fine". Personally, I'm not comfortable with the level of churn.

By well_ackshually 2026-02-0423:591 reply

Also take something into account: absolutely _none_ of the vibe coding influencer bros make anything more complicated than a single-feature, already implemented 50 times webapp. They've never built anything complicated either, or maintained something for more than a few years with all the warts that it entails. Literally, from his bio on his website:

> For 12 years, I led data and analytics at Indeed - creating company-wide success metrics used in board meetings, scaling SMB products 6x, managing organizations of 70+ people.

He's a manager that made graphs on Power BI.

They're not here because they want to build things, they're here to shit a product out and make money. By the time Claude has stopped being able to pipe together ffmpeg commands or glue together 3 JS libraries, they've gone on to another project and whoever bought it is a sucker.

It's not that much different from the companies of the 2000s promising a 5th generation language with a UI builder that would fix everything.

And then, as a very last warning: the author of this piece sells AI consulting services. It's in his interest to make you believe everything he has to say about AI, because by God is there going to be suckers buying his time at indecently high prices to get shit advice. This sucker is most likely your boss, by the way.

By fragmede 2026-02-056:322 reply

No true programmer would vibecode an app, eh?

By well_ackshually 2026-02-0510:33

Oh no, they would. I would.

I'd have the decency to know and tell people that it's a steaming pile of shit and that I have no idea how it works though, and would not have the shamelessness to sell a course on how to put out LLM vomit in public though.

Engineering implies respect for your profession. Act like it.

By sebastos 2026-02-0515:11

But invoking No True Scotsman would imply that the focus is on gatekeeping the profession of programming. I don’t think the above poster is really concerned with the prestige aspect of whether vibe bros should be considered true programmers. They’re more saying that if you’re a regular programmer worried about becoming obsolete, you shouldn’t be fooled by the bluster. Vibe bros’ output is not serious enough to endanger your job, so don’t fret.

By farnsworth 2026-02-0422:08

Yes, and you can rebuild them for free

By RA_Fisher 2026-02-0421:571 reply

Claude, Codex and Gemini can read code much faster than we can. I still read snippets, but mostly I have them read the code.

By GalaxyNova 2026-02-0422:111 reply

Unfortunately they're still too superficial. 9 times out of 10 they don't have enough context to properly implement something and end up just tacking it on in some random place with no regard for the bigger architecture. Even if you do tell it something in an AGENT.md file or something, it often just doesn't follow it.

By RA_Fisher 2026-02-0423:16

I use them to probabilistically program. They’re better than me and I’ve been at it for 16 years now. So I wouldn’t say they’re superficial at all.

What have you tried to use them for?

By ekidd 2026-02-0422:031 reply

I have a wide range of Claude Code based setups, including one with an integrated issue tracker and parallel swarms.

And for anything really serious? Opus 4.5 struggles to maintain a large-scale, clean architecture. And the resulting software is often really buggy.

Conclusion: if you want quality in anything big in February 2026, you still need to read the code.

By manmal 2026-02-0422:08

Opus is too superficial for coding (great at bash though, on the flipside), I‘d recommend giving Codex a try.

By cdfuller 2026-02-0422:04

As LLMs advance so rapidly I think that all the AI slop code written today will be easily digestible by the LLMs a few generations down the line. I think there will be a lot of improvements in making user intent clearer. Combined with a bad codebase and larger context windows, refactoring wont be a challenge.

By joriJordan 2026-02-0422:121 reply

Remember though this forum is full of people who consider code objects when it's just state in a machine.

We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?

Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.

Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.

Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.

Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.

Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".

It's just electromagnetic geometry, delta sync between memory and display: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...

By geraneum 2026-02-050:03

Wie bitte?