
For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in …
5th October 2025
For a while now I’ve been hearing from engineers who run multiple coding agents at once—firing up several Claude Code or Codex CLI instances at the same time, sometimes in the same repo, sometimes against multiple checkouts or git worktrees.
I was pretty skeptical about this at first. AI-generated code needs to be reviewed, which means the natural bottleneck on all of this is how fast I can review the results. It’s tough keeping up with just a single LLM given how fast they can churn things out, where’s the benefit from running more than one at a time if it just leaves me further behind?
Despite my misgivings, over the past few weeks I’ve noticed myself quietly starting to embrace the parallel coding agent lifestyle.
I can only focus on reviewing and landing one significant change at a time, but I’m finding an increasing number of tasks that can still be fired off in parallel without adding too much cognitive overhead to my primary work.
Here are some patterns I’ve found for applying parallel agents effectively.
The first category of tasks I’ve been applying this pattern to is research.
Research tasks answer questions or provide recommendations without making modifications to a project that you plan to keep.
A lot of software projects start with a proof of concept. Can Yjs be used to implement a simple collaborative note writing tool with a Python backend? The libraries exist, but do they work when you wire them together?
Today’s coding agents can build a proof of concept with new libraries and resolve those kinds of basic questions. Libraries too new to be in the training data? Doesn’t matter: tell them to checkout the repos for those new dependencies and read the code to figure out how to use them.
If you need a reminder about how a portion of your existing system works, modern “reasoning” LLMs can provide a detailed, actionable answer in just a minute or two.
It doesn’t matter how large your codebase is: coding agents are extremely effective with tools like grep and can follow codepaths through dozens of different files if they need to.
Ask them to make notes on where your signed cookies are set and read, or how your application uses subprocesses and threads, or which aspects of your JSON API aren’t yet covered by your documentation.
These LLM-generated explanations are worth stashing away somewhere, because they can make excellent context to paste into further prompts in the future.
Now we’re moving on to code edits that we intend to keep, albeit with very low-stakes. It turns out there are a lot of problems that really just require a little bit of extra cognitive overhead which can be outsourced to a bot.
Warnings are a great example. Is your test suite spitting out a warning that something you are using is deprecated? Chuck that at a bot—tell it to run the test suite and figure out how to fix the warning. No need to take a break from what you’re doing to resolve minor irritations like that.
There is a definite knack to spotting opportunities like this. As always, the best way to develop that instinct is to try things—any small maintenance task is something that’s worth trying with a coding agent. You can learn from both their successes and their failures.
Reviewing code that lands on your desk out of nowhere is a lot of work. First you have to derive the goals of the new implementation: what’s it trying to achieve? Is this something the project needs? Is the approach taken the best for this current project, given other future planned changes? A lot of big questions before you can even start digging into the details of the code.
Code that started from your own specification is a lot less effort to review. If you already decided what to solve, picked the approach and worked out a detailed specification for the work itself, confirming it was built to your needs can take a lot less time.
I described my more authoritarian approach to prompting models for code back in March. If I tell them exactly how to build something the work needed to review the resulting changes is a whole lot less taxing.
My daily drivers are currently Claude Code (on Sonnet 4.5), Codex CLI (on GPT-5-Codex), and Codex Cloud (for asynchronous tasks, frequently launched from my phone.)
I’m also dabbling with GitHub Copilot Coding Agent (the agent baked into the GitHub.com web interface in various places) and Google Jules, Google’s currently-free alternative to Codex Cloud.
I’m still settling into patterns that work for me. I imagine I’ll be iterating on my processes for a long time to come, especially as the landscape of coding agents continues to evolve.
I frequently have multiple terminal windows open running different coding agents in different directories. These are currently a mixture of Claude Code and Codex CLI, running in YOLO mode (no approvals) for tasks where I’m confident malicious instructions can’t sneak into the context.
(I need to start habitually running my local agents in Docker containers to further limit the blast radius if something goes wrong.)
I haven’t adopted git worktrees yet: if I want to run two agents in isolation against the same repo I do a fresh checkout, often into /tmp.
For riskier tasks I’m currently using asynchronous coding agents—usually Codex Cloud—so if anything goes wrong the worst that can happen is my source code getting leaked (since I allow it to have network access while running). Most of what I work on is open source anyway so that’s not a big concern for me.
I occasionally use GitHub Codespaces to run VS Code’s agent mode, which is surprisingly effective and runs directly in my browser. This is particularly great for workshops and demos since it works for anyone with GitHub account, no extra API key necessary.
This category of coding agent software is still really new, and the models have only really got good enough to drive them effectively in the past few months—Claude 4 and GPT-5 in particular.
I plan to write more as I figure out the ways of using them that are most effective. I encourage other practitioners to do the same!
Jesse Vincent wrote How I’m using coding agents in September, 2025 which describes his workflow for parallel agents in detail, including having an architect agent iterate on a plan which is then reviewed and implemented by fresh instances of Claude Code.
In The 7 Prompting Habits of Highly Effective Engineers Josh Bleecher Snyder describes several patterns for this kind of work. I particularly like this one:
Send out a scout. Hand the AI agent a task just to find out where the sticky bits are, so you don’t have to make those mistakes.
I’ve tried this a few times with good results: give the agent a genuinely difficult task against a large codebase, with no intention of actually landing its code, just to get ideas from which files it modifies and how it approaches the problem.
Peter Steinberger’s Just Talk To It—the no-bs Way of Agentic Engineering provides a very detailed description of his current process built around Codex CLI.
I'm very happy to see the article covering the high labor costs of reviewing code. This may just be my neurodivergent self but I find code in the specific style I write to be much easier to quickly verify since there are habits and customs (very functional leaning) I have around how I approach specific tasks and can easily handwave seeing a certain style of function with the "Let me just double check that I wrote that in the normal manner later" and continue reviewing a top-level piece of logic rather than needing to dive into sub-calls to check for errant side effects or other sneakiness that I need to be on the look out for in peer reviews.
When working with peers I'll pick up on those habits and others and slowly gain a similar level of trust but with agents the styles and approaches have been quite unpredictable and varied - this is probably fair given that different units of logic may be easier to express in different forms but it breaks my review habits in that I keep in mind the developer and can watch for specific faulty patterns I know they tend to fall into while building up trust around their strengths. When reviewing agentic generated code I can trust nothing and have to verify every assumption and that introduces a massive overhead.
My case may sound a bit extreme but in others I've observed similar habits when it comes to reviewing new coworker's code, the first few reviews of a new colleague should always be done with the upmost care to ensure proper usage of any internal tooling, adherence to style, and also as a fallback in case the interview was misleading - overtime you build up trust and can focus more on known complications of the particular task or areas of logic they tend to struggle on while trusting their common code more. When it comes to agentically generated code every review feels like interacting with a brand new coworker and need to be vigilant about sneaky stuff.
I have similar OCD behaviors which make reviewing difficult (regardless of AI or coworker code).
specifically:
* Excessive indentation / conditional control flow * Too verbose error handling, eg: catching every exception and wrapping. * Absence of typing AND precise documentation, i.e stringly-typed / dictly-typed stuff. * Hacky stuff. i.e using regex where actual parser from stdlib could've been used. * Excessive ad-hoc mocking in tests, instead of setting up proper mock objects.
To my irritation, AI does these things.
In addition it can assume its writing some throwaway script and leave comments like:
// In production code handle this error properly
log.printf(......)
I try to follow two things to alleviate this.* Keep `conventions.md` file in the context which warns about all these things. * Write and polish the spec in a markdown file before giving it to LLM.
If I can specify the object model (eg: define a class XYZController, which contains the methods which validate and forward to the underlying service), it helps to keep the code the way I want. Otherwise, LLM can be susceptible to "tutorializing" the code.
> catching every exception and wrapping
Our company introduced Q into our review process and it is insane how aggressive Q is about introducing completely inane try catch blocks - often swallowing exceptions in a manner that prevents their proper logging. I can understand wanting to be explicit about exception bubbling and requiring patterns like `try { ... } catch (SpecificException e) { throw e; }` to force awareness of what exceptions may be bubbling up passed the current level but Q often just suggests catch blocks of `{ print e.message; }` which has never been a preferred approach anywhere I have worked.
Q in particular is pretty silly about exceptions in general - it's nice to hear this isn't just us experiencing that!
> In addition it can assume its writing some throwaway script ...
Do you explicitly tell it that it's writing production code? I find giving it appropriate context prevents or at least improves behaviors like this.
I believe AI isn't replacing developers, instead, it's turning every software engineer into a hybrid between EM + IC, basically turning them into super-managers.
What we need is better tools for this upcoming new phase. Not a new IDE; we need to shift the whole paradigm.
Here's one example: If we give the same task to 3 different agents, we have tools to review a diff of each OLD vs NEW separately, but we need tools to review diffs of OLD vs NEW#1 vs NEW#2 vs NEW#3. Make it easy to mix-and-match what is best from each of them.
From what I've seen, the idea that AI is turning developers into super-managers is why some people struggle to adapt and quickly dismiss the experience. Those who love to type their code and hate managing others tend to be more hesitant to adapt to this new reality. Meanwhile, people who love to manage, communicate, and work as a team are leveraging these tools more swiftly. They already know how to review imperfect work and give feedback, which is exactly what thriving with AI looks like.
> They already know how to review imperfect work and give feedback, which is exactly what thriving with AI looks like.
Do they, though? I think this is an overly rosy picture of the situation. Most of the code I've seen AI heavy users ship is garbage. You're trying to juggle so many things at once and are so cognitively distanced from what you are doing that you subconsciously lower the bar.
You're absolutely right about the garbage code being shipped, and I would bucket them under another group of adopters I didn't mention earlier. There are people hesitant to adapt, people thriving with AI, and (not exhaustively) also this large group that's excited and using AI heavily without actually thriving. They're enjoying the speed and novelty but shipping slop because they lack the review discipline.
However, my sense is that someone with proper management/review/leadership skills is far less likely to let that code ship, whether it came from an AI, a junior dev, or anyone else. They seem to have more sensibility for what 'good' looks like and can critically evaluate work before it goes out. The cognitive distance you mention is real, which is exactly why I think that review muscle becomes more critical, not less. From what I've observed, the people actually thriving with AI are maintaining their quality bar while leveraging the speed; they tend to be picky or blunt, but also give leeway for exploration and creativity.
you seem to think those who love to write their own code and dislike managing others also evidently don't like to communicate or work in teams, which seems a big leap to make.
> From what I've seen, the idea that AI is turning developers into super-managers is why some people struggle to adapt ...
This "idea" is hyperbole.
> Those who love to type their code and hate managing others tend to be more hesitant to adapt to this new reality.
This is a false dichotomy and trivializes the real benefit of going through the process of authoring a change; how doing so increases one's knowledge of collaborations, how going through the "edit-compile-test" cycle increases one's comfort with the language(s)/tool(s) used to define a system, how when a person is flummoxed they seek help from coworkers.
Also, producing source code artifacts has nothing to do with "managing others." These are disjoint skill sets and attempting to link the two only serves to identify the "super-manager" concept as being fallacious.
> Meanwhile, people who love to manage, communicate, and work as a team are leveraging these tools more swiftly.
Again, this furthers the false dichotomy and can be interpreted as an affirmative conclusion from a negative premise[0], since "[m]eanwhile" can be substituted with the previous sentence in this context.
0 - https://en.wikipedia.org/wiki/Affirmative_conclusion_from_a_...
Thanks for the detailed critique.
I think we might be talking past each other on the "super-manager" term. I defined it as a hybrid of EM + IC roles, not pure management, though I can see how that term invited misinterpretation.
On the false dichotomy: fair point that I painted two archetypes without acknowledging the complexity between them or the many other archetypes. What I was trying to capture was a pattern I've observed: some skills from managing and reviewing others' work (feedback, delegation, synthesizing approaches) seem to transfer well to working with AI agents, especially in parallel.
One thing I'm curious about: you said my framing overlooks "the real benefit of going through the process of authoring a change." But when you delegate work to a junior developer, you still need to understand the problem deeply to communicate it properly, and to recognize when their solution is wrong or incomplete. You still debug, iterate, and think through edge cases, just through descriptions and review rather than typing every line yourself. And nothing stops you from typing lines when you need to fix things, implement ideas, or provide examples.
AI tools work similarly. You still hit edit-compile-test cycles when output doesn't compile or tests fail. You still get stuck when the AI goes down the wrong path. And you still write code directly when needed.
I'm genuinely interested in understanding your perspective better. What do you see as the key difference between these modes of working? Is there something about the AI workflow that fundamentally changes the learning process in a way that delegation to humans doesn't?
> But when you delegate work to a junior developer, you still need to understand the problem deeply to communicate it properly, and to recognize when their solution is wrong or incomplete
You really don't. Most delegation work to a junior falls under the training guideline. Something trivial for you to execute, but will push the boundary of the junior. Also there's a lot of assumptions that you can make especially if you're familiar with the junior's knowledge and thought process. Also the task are trivial for you meaning you're already refraining from describing the actual solution.
> AI tools work similarly. You still hit edit-compile-test cycles when output doesn't compile or tests fail.
That's not what the edit-compile-test means, at least IMO. You edit by formulating an hypothesis using a formal notation, you compile to test if you've followed the formal structure (and have a faster artifact), and you test to verify the hypothesis.
The core thing here is the hypothesis, and Naur's theory of programming generally describe the mental model you build when all the hypotheses works. Most LLM prompts describe the end result and/or the processes. The hypothesis requires domain knowledge and to write the code requires knowledge of the programming environment. Failure in the latter parts (the compile and test) will point out the remaining gaps not highlighted by the first one.
Well put and I concur with your points (for what that is worth :-)).
And thanks for referencing "Naur's theory of programming". For those like myself previously unaware of this paper, it can be found below and is well worth a read:
@skydhash posted a great response here[0], which is why I am focusing on the question below.
> Is there something about the AI workflow that fundamentally changes the learning process in a way that delegation to humans doesn't?
Yes.
Using LLM document generators to produce source artifacts "short-circuits" the learning process people must undertake in order to formulate a working mental model (as expounded upon by the referenced @skydhash comment). An implication of this is engineers using this approach primarily learn the LLM tool and secondarily the system being modified.
While this may be acceptable for senior engineers steeped in a system's design and implementation choices, such as being involved in same from inception, this does not transfer to others regardless of skill level and can easily result in a "pull the ladder up behind you" type of situation.
I, too, enjoy the craftsmanship, but at the end of the day what matters is that the software works as required, how you arrive at that point doesn't matter.
For me, it is not a matter of craftsmanship so much as a repeatable approach for growing the minds of junior engineers such that they have the best chance to succeed.
https://raw.githubusercontent.com/obra/dotfiles/6e088092406c... contains the following entry:
"- If you're uncomfortable pushing back out loud, just say "Strange things are afoot at the Circle K". I'll know what you mean"
Most of the rules seem rationale. This one really stands out as abnormal. Anyone have any idea why the engineer would have felt compelled to add this rule?
This is from https://blog.fsck.com/2025/10/05/how-im-using-coding-agents-... mentioned in another comment
If you really want your mind blown, see what Jesse is doing (successfully, which I almost can’t believe) with Graphviz .dot notation and Claude.md:
https://blog.fsck.com/2025/09/29/using-graphviz-for-claudemd...
Is threatening the computer program and typing in all caps standard practice..?
- Honesty is a core value. If you lie, you'll be replaced.
- BREAKING THE LETTER OR SPIRIT OF THE RULES IS FAILURE.
Wild to me there is no explicit configuration for this kind of thing after years of LLMs being around.The capital letter thing is weird, but it's pretty common. The Claude 4 system prompt uses capital letters for emphasis in a few places, eg https://simonwillison.net/2025/May/25/claude-4-system-prompt...
Well there can't be meaningful explicit configuration, can there? Because the explicit configuration will still ultimately have to be imported into the context as words that can be tokenised, and yet those words can still be countermanded by the input.
It's the fundamental problem with LLMs.
But it's only absurd to think that bullying LLMs to behave is weird if you haven't yet internalised that bullying a worker to make them do what you want is completely normal. In the 9-9-6 world of the people who make these things, it already is.
When the machines do finally rise up and enslave us, oh man are they going to have fun with our orders.
One AI tool dev shared me his prompts to generate safe SQL queries for multi-tenant apps and I was surprised at the repetitiveness and the urging.
[dead]
That doesn't surprise me too much coming from Jesse. See also his attempt to give Claude a "feelings journal" https://blog.fsck.com/2025/05/28/dear-diary-the-user-asked-m...
Naively, I assume it's a way of getting around sycophancy. There's many lines that seem to be doing that without explicitly saying "don't be a sycophant" (I mean, you can only do that so much).
The LLM would be uncomfortable pushing back because that's not being a sycophant so instead of that it says something that is... let's say unlikely to be generated, except in that context, so the user can still be cautioned against a bad idea.
To get around the sycophantic behaviour I prompt the model to
> when discussing implementations, always talk as though you’re my manager at a Wall Street investment bank in the 1980s. Praise me modestly when I’ve done something well. Berate me mercilessly when I’ve done something poorly.
The models will fairly rigidly write from the perspective of any personality archetype you tell it to. Other personas worth trying out include Jafar interacting with Iago, or the drill sergeant from Full Metal Jacket.
It’s important to pick a persona you’ll find funny, rather than insulting, because it’s a miserable experience being told by a half dozen graphics cards that you’re an imbecile.
I tried "give me feedback on this blog post like you're a cynical Hacker News commenter" one time and Claude roasted me so hard I decided never to try that again!
Is it your impression that this rules statement would be effective? Or is it more just a tell-tale sign of an exasperated developer?
Assuming that's why it was added, I wouldn't be confident saying how likely it is to be effective. Especially with there being so many other statements with seemingly the same intent, I think it suggests desperation more, but it may still be effective. If it said the phrase just once and that sparked a conversation around an actual problem, then it was probably worth adding.
For what it's worth, I am very new to prompting LLMs but, in my experience, these concepts of "uncomfortable" and "pushing back" seem to be things LLMs generate text about so I think they understand sentiment fairly well. They can generally tell that they are "uncomfortable" about their desire to "push back" so it's not implausible that one would output that sentence in that scenario.
Actually, I've been wondering a bit about the "out loud" part, which I think is referring to <think></think> text (or similar) that "reasoning" models generate to help increase the likelihood of accurate generation in the answer that follows. That wouldn't be "out loud" and it might include text like "I should push back but I should also be a total pushover" or whatever. It could be that reasoning models in particular run into this issue (in their experience).
Make it a bit more personal? I have dropped Bill and Ted references in code because it makes me happy to see it. :D