Beyond agentic coding

2026-02-081:5526990haskellforall.com

AI dev tooling can do better than chat interfaces

Show article

I'm generally pretty pro-AI with one major exception: agentic coding. My consistent impression is that agentic coding does not actually improve productivity and deteriorates the user's comfort and familiarity with the codebase. I formed that impression from:

my own personal experiences

Every time I use agentic coding tools I'm consistently unimpressed with the quality of the results.
my experiences interviewing candidates

I allow interview candidates to use agentic coding tools and candidates who do so consistently performed worse than other candidates, failing to complete the challenge or producing incorrect results¹. This was a huge surprise to me at first because I expected agentic coding to confer an unfair advantage but … nope!
research studies

Studies like the Becker study and Shen study show that users of agentic coding perform no better and sometimes worse when you measure productivity in terms of fixed outcomes rather than code velocity/volume.

I don't believe agentic coding is a lost cause, but I do believe agentic coding in its present incarnation is doing more harm than good to software development. I also believe it is still worthwhile to push on the inadequacies of agentic coding so that it empowers developers and improves code quality.

However, in this post I'm taking a different tack: I want to present other ways to leverage AI for software development. I believe that agentic coding has so captured the cultural imagination that people are sleeping on other good and underexplored solutions to AI-assisted software development.

The master cue

I like to design tools and interfaces from first principles rather than reacting to industry trends/hype and I've accrued quite a few general design principles from over a decade of working in DevProd and also an even longer history of open source projects and contributions.

One of those design principles is my personal "master cue", which is:

A good tool or interface should keep the user in a flow state as long as possible

This principle isn't even specific to AI-assisted software development, and yet still highlights why agentic coding sometimes misses the mark. Both studies and developer testimonials show that agentic coding breaks flow and keeps developers in an idle/interruptible holding pattern more than ordinary coding.

For example, the Becker study took screen recordings and saw that idle time approximately doubled:

I believe we can improve AI-assisted coding tools (agentic or not) if we set our north star to “preserve flow state”.

Calm technology

Calm technology is a design discipline that promotes flow state in tools that we build. The design principles most relevant to coding are:

tools should minimize demands on our attention

Interruptions and intrusions on our attention break us out of flow state.
tools should be built to be “pass-through”

A tool is not meant to be the object of our attention; rather the tool should reveal the true object of our attention (the thing the tool acts upon), rather than obscuring it. The more we use the tool the more the tool fades into the background of our awareness while still supporting our work.
tools should create and enhance calm (thus the name: calm technology)

A state of calm helps users enter and maintain flow state.

Non-LLM examples of calm technology

Engineers already use “calm” tools and interfaces as part of our work and here are a couple of examples you're probably already familiar with:

Inlay hints

IDEs (like VSCode) can support inlay hints that sprinkle the code with useful annotations for the reader, such as inferred type annotations:

These types of inlay hints embody calm design principles because:

they minimize demands on our attention

They exist on the periphery of our attention, available for us if we're interested but unobtrusive if we're not interested.
they are built to be “pass-through”

They don't replace or substitute the code that we are editing. They enhance the code editing experience but the user is still in direct contact with the edited code. The more we use type hints the more they fade into the background of our awareness and the more the code remains the focus of our attention.
they create and enhance calm

They promote a sense of calm by informing our understanding of the code passively. As one of the Calm Technology principles puts it: “Technology can communicate, but doesn't need to speak”.

File tree previews

Tools like VSCode or GitHub's pull request viewer let you preview at a glance changes to the file tree, like this:

You might think to yourself “this is a very uninteresting thing to use as an example” but that's exactly the point. The best tools (designed with the principles of calm technology) are pervasive and boring things that we take for granted (like light switches) and that have faded so strongly into the background of our attention that we forget they even exist as a part of our daily workflow (also like light switches).

File tree previews:

minimize demands on our attention

They're there if we need the information, but easy to ignore (or even forget they exist) if we don't use them.
are built to be “pass-through”

When we interact with the file tree viewer we are interacting directly with the filesystem and the interaction between the representation (the viewer) and the reality (the filesystem) feels direct, snappy, and precise. The more we use the viewer the more the representation becomes indistinguishable from the reality in our minds.
create and enhance calm

We do not need to constantly interact with the file tree to gather up-to-date information about our project structure. It passively updates in the background as we make changes to the project and those updates are unobtrusive and not attention-grabbing.

Chat-based coding agents are not calm

We can think about the limitations of chat-based agentic coding tools through this same lens:

they place high demands on our attention

The user has to either sit and wait for the agent to report back or do something else and run the LLM in a semi-autonomous manner. However, even semi-autonomous sessions prevent the user from entering flow state because they have to remain interruptible.
they are not built to be “pass-through”

Chat agents are a highly mediated interface to the code which is indirect (we interact more with the agent than the code), slow (we spend a lot of time waiting), and imprecise (English is a dull interface).
they undermine calm

The user needs to constantly stimulate the chat to gather new information or update their understanding of the code (the chat agent doesn't inform the user's understanding passively or quietly). Chat agents are also fine-tuned to maximize engagement.

Prior art for calm design

Inline suggestions from GitHub Copilot

One of the earliest examples of an AI coding assistant that begins to model calm design principles is the OG AI-assistant: GitHub Copilot's support for inline suggestions, with some caveats I'll go into.

This does one thing really well:

it's built to be “pass-through”

The user is still interacting directly with the code and the suggestions are reasonably snappy. The user can also ignore or type through the suggestion.

However, by default these inline suggestions violate other calm technology principles:

they demand our attention

By default Copilot presents the suggestions quite frequently and the user has to pause what they're doing to examine the output of the suggestion. After enough times the user begins to condition themselves into regularly pausing and waiting for a suggestion which breaks them out of a flow state. Now instead of being proactive the user's been conditioned by the tool to be reactive.
they undermine calm

GitHub Copilot's inline suggestion interface is visually busy and intrusive. Even if the user ignores every suggestion the effect is still disruptive: suggestions appear on the user's screen in the center of their visual focus and the user has to decide on the spot whether to accept or ignore them before proceeding further. The user also can't easily passively absorb information presented in this way: understanding each suggestion requires the user's focused attention.

… buuuuut these issues are partially fixable by disabling the automatic suggestions and requiring them to be explicitly triggered by Alt + \. However, unfortunately that also disables the next feature, which I like even more:

Next edit suggestions (also from GitHub Copilot)

Next edit suggestions are a related GitHub Copilot feature that display related follow-up edits throughout the file/project and let the user cycle between them and possibly accept each suggested change. They behave like a “super-charged find and replace”:

These suggestions do an amazing job of keeping the user in a flow state:

they minimize demand on the user's attention

The cognitive load on the user is smaller than inline suggestions because the suggestions are more likely to be bite-sized (and therefore easier for a human to review and accept).
they're built to be “pass-through”

Just like inline suggestions, next edit suggestions still keep the user in close contact with the code they are modifying.
they create and enhance calm

Suggestions are presented in an unobtrusive way: they aren't dumped in the dead center of the user's attention and they don't demand immediate review. They exist on the periphery of the user's attention as code suggestions that the user can ignore or focus on at their leisure.

AI-assisted calm technology

I believe there is a lot of untapped potential in AI-assisted coding tools and in this section I'll sketch a few small examples of how we can embody calm technology design principles in building the next generation of coding tools.

Facet-based project navigation

You could browse a project by a tree of semantic facets. For example, if you were editing the Haskell implementation of Dhall the tree viewer might look like this prototype I hacked up²:

The goal here is to not only provide a quick way to explore the project by intent, but to also improve the user's understanding of the project the more they use the feature. "String interpolation regression" is so much more informative than dhall/tests/format/issue2078A.dhall³.

Also, the above video is based on a real tool and not just a mock. You can find the code I used to generate that tree of semantics facets here and I'll write up another post soon walking through how that code works.

Automated commit refactor

You could take an editor session, a diff, or a pull request and automatically split it into a series of more focused commits that are easier for people to review. This is one of the cases where the AI can reduce human review labor (most agentic coding tools create more human review labor).

There is some prior art here but this is still a nascent area of development.

File lens

You could add two new tools to the user's toolbar or context menu: “Focus on…” and “Edit as…”.

“Focus on…” would allow the user to specify what they're interested in changing and present only files and lines of code related to their specified interest. For example, if they want to focus on “command line options” then only related files and lines of code would be shown in the editor and other lines of code would be hidden/collapsed/folded. This would basically be like “Zen mode” but for editing a feature domain of interest.

“Edit as…” would allow the user to edit the file or selected code as if it were a different programming language or file format. For example, someone who was new to Haskell could edit a Haskell file “as Python” and then after finishing their edits the AI attempts to back-propagate their changes to Haskell. Or someone modifying a command-line parser could edit the file “as YAML” and be presented with a simplified YAML representation of the command line options which they could modify to add new options.

Conclusion

This is obviously not a comprehensive list of ideas, but I wrote this to encourage people to think of more innovative ways to incorporate AI into people's workflows besides just building yet another chatbot. I strongly believe that chat is the least interesting interface to LLMs and AI-assisted software development is no exception to this.

Read the original article

Comments

By WilcoKruijer 2026-02-0811:449 reply

> You could take an editor session, a diff, or a pull request and automatically split it into a series of more focused commits that are easier for people to review. This is one of the cases where the AI can reduce human review labor

I feel this should be a bigger focus than it is. All the AI code review start up are mostly doing “hands off” code review. It’s just an agent reviewing everything.

Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.

By wazHFsRy 2026-02-0814:06

> Why not have an agent create a perfect “review plan” for human consumption? Split the review up in parts that can be individually (or independently) reviewed and then fixed by the coding agent. Have a proper ordering in files (GitHub shows files in a commit alphabetically, which is suboptimal), and hide boring details like function implementations that can be easily unit tested.

Yes exactly! I have been using this to create a comment on the PR, showing suggested review order and a diagram of how changes relate to each other. And even this super simple addition has been very helpful for code review so far!

(more on this: https://www.dev-log.me/pr_review_navigator_for_claude/)

By kloud 2026-02-0815:241 reply

Exactly this, existing code review tools became insufficient with the increase of volume of code, I would like to see more innovation here.

One idea that comes to mind to make review easier would be to re-create commits following Kent Beck's SB Changes concept - splitting structure changes (tidying/refactoring) and behavior changes (features). The structure changes could then be quickly skimmed (especially with good coverage) and it should save focus for review of the behavior changes.

The challenge is that it is not the same as just committing the hunks in different order. But maybe a skill with basic agent loop could work with capabilities of models nowadays.

By WilcoKruijer 2026-02-0816:211 reply

I experimented with a command for atomic commits a while ago. It explicitly instructed the agent to review the diff and group related changes to produce a commit history where every HEAD state would work correctly. I tried to get it to use `git add -p`, but it never seemed to follow those instructions. Might be time for another go at this with a skill.

By jmalicki 2026-02-0912:12

I have had success with having the skill create a new branch and moving pieces of code there, testing them after the move, then adding it.

So commit locally and have it recreate the commit as a sequence on another branch.

By telotortium 2026-02-0813:43

Unfortunately GitHub doesn’t let you easily review commits in a PR. You can easily selectively review files, but comments are assumed to apply to the most recent HEAD of the PR branch. This is probably why review agents don’t natively use that workflow. It would probably not be hard to instruct the released versions of Opus or Codex to do this, however, particularly if you can generate a PR plan, either via human or model.

By SatvikBeri 2026-02-0813:44

I do this. For example, the other day I made a commit where I renamed some fields of a struct and removed others, then I realized it would be easier to review if those were two separate commits. But it was hard to split them out mechanically, so I asked Claude to do it, creating two new commits whose end result must match the old one and must both past tests. It works quite well.

By CuriouslyC 2026-02-0815:48

I've been talking about having AI add comments to PRs to draw attention to things that should be given special attention since last May. I think most code review tools don't do this because A/B testing has shown people engage less/churn more with noisier review output.

By jmalicki 2026-02-0912:10

That sounds like stacked changes (if you're not familiar think how lkml patches are like 0/8 1/8 etc where each is a standalone change that only depends on ones before), and I have been using agents create sets of stacked PRs when I have a big diff.

Instead of ordering of files, it creates an ordering of PRs where each has descriptions, independent CI, etc. and can be merged one at a time (perhaps at the small cost of the main branch having unused library functions until the final PR is merged)

By zmj 2026-02-0814:55

I like this thought. Scaling review is definitely a bottleneck (for those of us who are still reading the code), and spending some tokens to make it easier seems worthwhile.

By jasonjmcghee 2026-02-0813:42

Yes please. There are many use cases where failure modes are similar to not using AI at all, which is useful.

Many very low risk applications of AI can add up to high payoff without high risk.

By jonfw 2026-02-0814:41

“I have a PR from <feature-branch> into main. Please break it into chunks and dispatch a background agent to review each chunk for <review-criteria>, and then go through the chunks one at a time with me, pausing between each for my feedback”

By andai 2026-02-086:543 reply

I wonder if the problem of idle time / waiting / breaking flow is a function of the slowness. That would be simple to test, because there are super fast 1000 tok/s providers now.

(Waiting for Cerebras coding plan to stop being sold out ;)

I've used them for smaller tasks (making small edits), and the "realtime" aspect of it does provide a qualitative difference. It stops being async and becomes interactive.

A sufficient shift in quantity produces a phase shift in quality.

That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.

The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually. (i.e. I have my own homebrew "agent" that's just a loop of, I prompt it, it proposes edits, I accept or edit, repeat.)

So then the "synchronization" of the mental state is happening continuously, because there is no opportunity for desynchronization. Because you are the one driving. I call that approach semi-auto, or Power Coding (akin to Power Armor, which is wielded manually but greatly enhances speed and strength).

By rubenflamshep 2026-02-0813:591 reply

> That said, the main issue I find with agentic is my mental model getting desynchronized. No matter how fast the models get, it takes a fixed amount of time for me to catch up and understand what they've done.

This is why I'm so skeptical of anyone running 6+ Claude sessions at a time. I've gotten to 5 but really that was across 3 sessions with 2 standing by just to commit stuff. And even with just 3 sessions I constantly lost where I was and wasted time re-orienting myself, doing work in the wrong session, etc.

>The most enjoyable way I've found of staying synced is to stay in the driver's seat, and to command many small rapid edits manually.

Same, there's a fantastic flow state/momentum I can get in a single session just knocking off features. I don't mind switching between two sessions in this state but the experience is better when it's two different projects vs two different features on the same project. The complete context switch lets be re-orient more easily

By resize2996 2026-02-0814:341 reply

Warning: I was in two different project experimenting with similar forms of db access at the same time. don't do that.

By sourabhrakhya 2026-02-0818:24

same

By dybber 2026-02-087:091 reply

You still have to synchronize with your code reviewers and teammates, so how well you work together in a team becomes a limiting factor at some point then I guess.

By tuhgdetzhh 2026-02-0810:434 reply

Yes, and that constraint shows up surprisingly early.

Even if you eliminate model latency and keep yourself fully in sync via a tight human-in-the-loop workflow, the shared mental model of the team still advances at human speed. Code review, design discussion, and trust-building are all bandwidth-limited in ways that do not benefit much from faster generation.

There is also an asymmetry: local flow can be optimized aggressively, but collaboration introduces checkpoints. Reviewers need time to reconstruct intent, not just verify correctness. If the rate of change exceeds the team’s ability to form that understanding, friction increases: longer reviews, more rework, or a tendency to rubber-stamp changes.

This suggests a practical ceiling where individual "power coding" outpaces team coherence. Past that point, gains need to come from improving shared artifacts rather than raw output: clearer commit structure, smaller diffs, stronger invariants, better automated tests, and more explicit design notes. In other words, the limiting factor shifts from generation speed to synchronization quality across humans.

By hibikir 2026-02-0814:39

I've seen this happen over and over again well before LLMs, when teams are sufficiently "code focused" that they don't care much at all about their teammates. The kind that would throw a giant architectural changes over a weekend. You then get to either freeze a person for days, or end up with codebases nobody remembers, because the bigger architectural changes are secret.

With a good modern setup, everyone can be that "productive", and the only thing that keeps a project coherent is if the original design holds, therefore making rearchitecture a very rare event. It will also push us to have smaller teams in general, just because the idea of anyone managing a project with, say, 8 developers writing a codebase at full speed seems impossible, just like it was when we added enough high performance, talented people to a project. It's just harder to keep coherence.

You can see this risk mentioned in The Mythical Man Month already. The idea of "The Surgery Team", where in practice you only have a couple of people truly owning a codebase, and most of the work we used to hand juniors just being done via AI. It'd be quite funny if the way we have to change our team organization moves towards old recommendations.

By EdNutting 2026-02-0811:032 reply

This thread seems to have re-identified Amdahl’s law in the context of software development workflow.

Agentic coding is only speeding up or parallelising a small part of the workflow - the rest is still sequential and human-driven.

By james_marks 2026-02-0814:56

This is 100% the new bottleneck. We’re going to see a lot agentic QA, E:E testing, etc soon for this reason.

By cyanydeez 2026-02-0814:05

And its abstracted as

Mythical Man Month -> Mythical Agent Swarm

By andai 2026-02-0814:17

I've mostly done solo work, or very small teams with clear separation of concerns. But this reads as less of a case against power coding, and more of a case against teams!

By zozbot234 2026-02-0811:42

You can ask the agent to reverse engineer its own design and provide a design document that can inform the code review discussion. Plus, hopefully human code review would only occur after several rounds of the agent refactoring its own one-shot slop into something that's up to near-human standards of surveyability and maintainability.

By port11 2026-02-0819:45

Waiting on AI is its own category, so I’m not entirely sure what ‘idle time’ means. Of course we could just go and read that study…

By Insanity 2026-02-083:346 reply

Post had nothing to do with Haskell so the title is a bit misleading. But rest of article is good, and I actually think that Agentic/AI coding will probably evolve in this way.

The current tools are the infancy of AI assisted coding. It’s like the MS-DOS era. Over time maybe the backpropagating from “your comfort language” to “target language” could become commonplace.

By josephcsible 2026-02-084:041 reply

> Post had nothing to do with Haskell so the title is a bit misleading.

To be fair, that's not part of the article's title, but rather the title of the website that the article was posted to.

By Insanity 2026-02-084:28

I know, but that's not typically how you see titles posted here. I'm just disappointed as I enjoy writing Haskell. :)

By ipnon 2026-02-083:43

Programming languages are most interesting area in CS for the next 10 years. AI need criteria for correctness that can't be faked so the boundary between proof verification and programs will become fuzzier and fuzzier. The runtimes also need support for massively parallel development in a way that is totally unnecessary for humans.

By yoyohello13 2026-02-085:02

I was excited to see a non-AI article on this site for once. Oh well.

It was a good article though

By nakedneuron 2026-02-0813:25

Agree. Gist of the FA is about "calm technology". Title should reflect it better.

Also agree on everything author mentions. I can't attest to all examples but I know what a UI is.

Author mentions center of focus of attention. We should hear more often about the periphery of our attention field. Its bandwidth so to speak is a magnitude lower compared to the center but it's still there and can guide some decisions quite unintrusively to flow.

(Major) eye movements are a detriment to attention, which itself should be treated like a commodity (in case of a UI thousands use, moreso like a borrowed commodity).

By lordgrenville 2026-02-088:37

Agreed. This website seems to prepend the blog name to each page's document.title

Would suggest that one of the mods remove it

By shevy-java 2026-02-089:52

Is the article good? I found it of a surprisingly poor quality. Is my assessment incorrect? Basically it is an article that tries to convince people of how relevant AI is nowadays. I don't really see it like that at all and none of the "arguments" I found convincing.