
Notes on rolling out Cursor and Claude Code
There is plenty of commentary online about how AI will replace coding as we know it in the future. There’s huge amounts of prediction about what the future holds. But I haven’t come across many case studies of rolling out the tools that exist today. So here’s some observations.
For context: SaaS product. Ruby on Rails. Mature (12 years) codebase, in good condition. About 40 developers.
We’ve given every developer the option to buy Cursor or Claude Code. (I’ll refer to these as Agents for the rest of this article.) If someone wanted to buy another agentic tool that isn’t one of those we’d probably say yes if the price was similar. In a recent informal survey I did, 8 people answered yes to “I use Cursor/Claude Code for most (or all) of my coding”, and 11 to “I alternate between using them and not using them based on the task, it averages out to about 50% of the time”.
I use Claude Code and don’t really like Cursor. The main reason for that is I’m obsessed with Sublime Text as my editor. So having to change to another editor is basically a deal breaker for me. Most other people are more sane, and thus Cursor adoption has been slightly higher than Claude.
They also different things. Claude (particularly with the newest models) likes to write entire features for you, and can sometimes get a bit carried away. Anthropic’s best practices guide is a goldmine, and one great suggestion it made that I’ve successfully adopted is asking Claude to make a plan before writing any code, and then telling it to write the code.
By contrast, Cursor (from what I’ve seen) seems to be content making smaller and more self contained changes. It seems less likely to try and write the entire thing and commit it without your consent. Conversely, it seems less capable of doing ambitious one shot prompts.
In the future, a “senior engineer” might be someone who knows the right agent to use for the given task.
It’s the question everyone is asking. And to be honest it’s hard to give a totally objective answer. How did you mention developer productivity before agents?
That said if forced to pick a number I would say I’m about 20% more productive thanks to agents. But there’s a massive grain of salt there. Agents are really helpful for some things, and barely helpful at all for others, so right now it is all about incorporating them into your workflow where it makes sense.
Some people online who are against agentic coding argue that with a junior developer you can teach them stuff and so they gradually improve over time. But the same can’t be said for agents, they are always going to be as smart as they are today, so if they are as good as a junior dev now they’ll always be only that good. Presumably this applies to domain specific stuff, since LLMs already know all the generic stuff in the world. Regardless, it’s nonsense, and is easily disproven by two observations:
Using agents well changes how you structure your codebase in a way that makes them more able to work with it over time, ie. they get smarter.
The quality of LLMs is improving often. The other week ChatGPT added a feature where it remembers everything you’ve ever said to it and incorporates that into context, which sounds a lot to me like a junior picking up domain knowledge and building on it.
So far the biggest limiting factor is remembering to use it. Even people I consider power users (based on their Claude token usage) agree with the sentiment that sometimes you just forget to ask Claude to do a task for you, and end up doing it manually. Sometimes you only notice that Claude could have done it, once you are finished. This happens to me an embarrassing amount.
I suspect this is less of an issue for Cursor users because it’s always there in your editor, while Claude Code lives in your terminal.
We have found agents work really well for increasing the ambition of code that gets written. This can mean a few different things:
Our head of product is a reformed lawyer who taught himself to code while working here. He’s shipped 150 PRs in the last 12 months.
The product manager he sits next to has shipped 130 PRs in the last 12 months. When we look for easy wins and small tasks for new starters, it’s harder now, because he’s always got an agent chewing through those in the background.
We built a new product for a new region with a single developer working solo (with the help of Claude). Previously we did this slower, with teams of people. Those teams are adopting Claude now too, but using it from the start and getting good at using it really helps.
I’m sure everyone comes across feature ideas that sound cool but it’s not obvious right now where to start. Like a coding writers block. Turns out typing a prompt is a lot easier than typing the code and it is a great way to unblock and get started on ideas you wouldn’t otherwise try.
Agents are extremely good at helping you understand parts of the codebase you haven’t seen before - just ask them - and I suspect that’s helping a lot here.
A few times now we’ve tried to connect agents to Linear or Sentry to try and get automatic draft PRs for bug fixing. So far the results have been mixed.
All you are really doing here is hoping that the Linear ticket (or context in Sentry issue) is enough that the agent can work out the bug and fix it. In other words you’re hoping the ticket works as well as a one shot prompt that a developer would write. Sometimes this happens, and if it does it’s very impressive.
More often it doesn’t. The issue I’ve found is that in many cases the proposed fix is incorrect, but in a subtle way, where if you don’t know the codebase you’re looking at (or don’t think about it critically enough) you’ll be led astray. It’s easy to see a draft PR with all tests passing, including new tests full of comments, and not realise that the tests are garbage and the fix is wrong. This is less likely to happen if you have taken time to think about the feature & scope beforehand, but much more likely if you see a bug report and a green draft PR ready to go alongside it.
For this reason we have avoided pushing automatic draft PRs.
Claude Code is incredibly good at refactors. Particularly if the refactors involve frontend code.
A recent example was to convert a set of screens built into React into our new design system, that’s built on Hotwire and is all rendered serverside. Using this prompt Claude Code came up with an almost right plan. I made a few tweaks and then asked it to write the code, which it did correctly.
It missed a few client side validations that we only found while testing, which is to say that it’s important to still test. I think if I had explicitly asked it to ensure it included all these sorts of validations in my prompt it would have.
Doing the refactor by hand would have taken me a few hours, so realistically using the agent probably saved me an hour. But it would have been boooooring. Knowing how boring it would be, I’d been putting off doing it forever, even though it was something I really wanted to see done. Agents are underrated for quickly getting through chores.
Agents work great at doing straightforward tasks using straightforward parts of well documented frameworks.
You have to be careful with more complex tasks that are below the surface level, where you could get a wide variety of architectures come back and you need to be critical about getting the best one.
For example, I asked an agent to help me ensure that a specific operation could only happen once at any time. I suggested it used locking. It came back with a custom built Redis locking mechanism. After a nudge, it suggested using the Rails cache. After another nudge, a manually executed Postgres advisory lock. After another nudge, it finally settled on Rails’ with_lock method. Had I not been thinking critically it would have had me introduce new dependencies to solve a problem where tools to fix it were built into the framework.
Cursor has a fixed price. I suspect they are thinking about pricing the way gyms do: if everyone used Cursor as much as they allow you to, they’d go out of business. But in practice most people use a lot less tokens than they are paying for.
You can see this in practice when you use Claude Code, which is pay-per-token. Our heaviest users are using $50/month of tokens. That’s a lot of tokens.
I asked our CFO and he said he’d be happy to spend $100/dev/month on agents. To get 20% more productive that’s a bargain.
This critique may only make sense to sufficiently DHH-pilled people, but so be it. I haven’t yet come across an agent that can write beautiful code.
That doesn’t mean it can’t write correct code, or (particularly if you prompt it right) succinct code. And it isn’t to say that all code needs to be beautiful. But in some cases the elegance of the code does matter because it says a lot about the elegance - and the quality - of the architecture. Agents still aren’t there in those cases.
The most common thing that makes agentic code ugly is the overuse of comments. I have tried everything I can think of and apart from prefacing every prompt with “Don’t write comments” I cannot find a way to get agents to not comment every line they write.
In the worst case this means rewriting all the code the agent wrote. I still think this is worthwhile. Often writing the prompt and going back and forth will the agent will help you understand the issue better which will lead to you writing your architecture better. And they write code so quickly, it’s not like you’re wasting days before throwing it away and writing it yourself.
A related issue is that if everyone uses agents, individual coding styles are lost. When you’ve worked with someone for a while you can tell if they wrote some code just by reading it. That is lost when everything goes through an agent. I’m sure in 10 years time this will seem quaint, like mourning over the assembly that powers the C code that sits under Ruby. But for now it makes me a little bit sad sometimes.
We’ve done a few things to make the codebase easier for agents to reason with.
Setting up Cursor rules and Claude.md. These end up also holding great context for new teammates.
Making it easy to run tests with a single command. We used to do development & run tests via docker over ssh. It was a good idea at the time. But fixing a few things so that we could run tests locally meant we could ask the agent to run (and fix!) tests after writing code.
Many companies have issued mandates for using AI, some of which have leaked publicly.
We haven’t. That doesn’t mean I don’t expect people to try this stuff out, but I think forcing them to do it is silly. If the productivity gains are as significant as I suspect they are, than anyone who has their self interest (or the company’s interests, if they are so inclined) at heart will quickly pick up agentic coding.
It is a very exciting time to be in software development. I’ve written this a few times, but it’s really true, and every time I write it I then get more impressed with the quality of agentic coding.
Similar to being good at writing code, being good at using agents is not a binary thing. There’s a sliding scale of quality. In the future there will be 10x prompters just like today there’s 10x developers.
I think it’s worth trying to get really good at using agents to code, no matter where in your career you are. Even with agents, the hardest thing in programming remains working out what the software should do, and articulating it well. Bashing out the syntax continues to get easier.
> So far the biggest limiting factor is remembering to use it. Even people I consider power users (based on their Claude token usage) agree with the sentiment that sometimes you just forget to ask Claude to do a task for you, and end up doing it manually. Sometimes you only notice that Claude could have done it, once you are finished. This happens to me an embarrassing amount.
Yea, this happens to me too. Does it say something about the tool?
It's not like we are talking about luddites who refuse to adopt the technology, but rather a group who is very open to use it. And yet sometimes, we "forget".
I very rarely regret forgetting. I feel a combination of (a) it's good practice, I don't want my skills to wither and (b) I don't think the AI would've been that much faster, considering the cost of thinking the prompt and that I was probably in flow.
If you're forgetting to use the tool, is the tool really providing benefit in that case? I mean, if a tool truly made something easier or faster that was onerous to accomplish, you should be much less likely to forget there's a better way ...
Yep! Most tools are there to handle the painful aspects of your tasks. It's not like you are consciously thinking about them, but just the fact on doing them without the tool will get a groan out of you.
A lot of current AI tools are toys. Fun to play around, but as soon as you have some real world tasks, you just do it your usual way that get the job done.
There's a balance to be calculated each time you're presented with the option. It's difficult to predict how much iteration the agent is going to require, how frustrating it might end up being, all the while you lose grip on the code being your own and your head-model of it, vs just going in and doing it and knowing exactly what's going on and simply asking it questions if any unknowns arise. Sometimes it's easier to just not even make the decision, so you disregard firing up the agent in a blink.
> is the tool really providing benefit in that case?
Yes, much of the time and esp. for tests. I've been writing code for 35 years. It takes a while to break old habits!
Our meat blobs forget things all the time. It's why the Todo apps and reminders even exist. Not using something every time doesn't mean it's not beneficial.
You never forgot your reusable grocery bag, umbrella, or sun glasses? You've never reassembled something and found a few "extra" screws?
Yes, but once I'm at the checkout or it starts raining, I reach for it...
I really really hate this idea that you should have AI do anything it can do, and that there's no value in doing it manually.
The value is in doing the thing, how it's done is just a matter of preference and efficiency.
Some tasks are faster than cognitive load to create a prompt and then wait for execution.
Also if you like doing certain tasks, then it is like eating an ice cream vs telling someone to eat an ice cream.
And the waiting is somewhat frustrating, what am I supposed to do while I wait? I could just sit and watch, or context switch to another task then forget the details on what I was originally doing.
I think you’re supposed to spin up another to do a different task. Then you’ll be occupied checking up on all of them, checking their output and prodding them along. At least that’s what Anthropic said you should do with Claude Code.
If I wanted to be an EM, I'd apply for that job.
I typically just think of more ideas, prompts, etc. while I wait.
The thing is others will eat ice cream faster so very soon there'll be no ice cream for me.
Many CLI tools that I love using now took some deliberate practice to establish a habit of using them.
I resonate with your case (b). One reason why I intentionally don't use it is cases where I know exactly what code I want to write, and can thus write it quicker than I can explain it to Claude + have Claude write it.
What makes this tough is there are so many variables.
What an AI could not do last week might be possible this week.
Or maybe my prompt didn't have enough guidance.
For the time being we have to keep testing the waters.
> The most common thing that makes agentic code ugly is the overuse of comments.
I've seen this complaint a lot, and I honestly don't get it. I have a feeling it helps LLMs write better code. And removing comments can be done in the reading pass, somewhat forcing you to go through the code line by line and "accept" the code that way. In the grand scheme of things, if this were the only downside to using LLM-based coding agents, I think we've come a long way.
I work with python codebases and consider comments that answer "what?" instead of "why?" bad.
LLMs tend to write comments answering "what?", sometimes to a silly extent. What I found helping for using Claude 3.7 was to add this rule in cursor. The fake xml tag helped to decrease the number of times it strays from my instructions.
<mandatory_code_instruction>
YOU ARE FORBIDDEN FROM ADDING ANY COMMENTS OR DOCSTRINGS. The only code accepted will be self-documenting code.
</mandatory_code_instruction>
If there's a section of code where a comment answering "why?" is needed this rule doesn't seem to interfere when I explicitly ask it to add it there.
I've noticed Gemini 2.5 pro does this a lot in Cursor. I'm not sure if it's because it doesn't work well with the system prompt or tools, but it's very annoying. There are comments for nearly every line and it's like it's thinking out loud in comments with lots of TODOS and placeholders.
That is because it is thinking out loud. Producing tokens is how it thinks.
Eh, you really have to take the "thinking" in quotes, in the LLM context "thinking" out loud is what makes the results better. I think (hah) their point stands.
They tend to add really bad comments though. I was looking at an LLM generated codebase recently and the comments are along the lines of “use our newly created Foo model”, which is pretty useless.
You can literally just ask it to not write too many comments, describe the kind of comments you want, and give a couple of examples. Save that in rules or whatever. And it's solved for the future :)
I tell them to write self-documenting code and to only leave comments when its essential for understanding, and that's worked out pretty well
I agree. I don’t always keep the comments, but I’m 100% ok with them.
If it helps... shouldn't you be "not deleting them" for future feature additions?
Yeah that's what I do, remove the comments as I read through.
I'm still having a hard time with coding agents. They are useful but also somehow immature hence dangerous. The other day I asked copilot with GPT4o to add docstrings to my functions in a long Python file. It did a good job on the first half. But when I looked carefully, I realized the second half of my file was gone. Just like that. Half of my file had been silently deleted, replaced by a single terrifying comment along the lines of "continue similarly with the rest of the file". I use Git of course so I could recover my deleted code. But I feel I still can't fully trust an AI assistant that will silently delete hundreds of lines of my codebase just because it is too lazy or something.
I’ve also seen catastrophic failures where the code returned completely fails for numerous “obvious” problems, including but not limited to missing code.
I tend to have to limit the code I share and ask more pointed / targeted questions in order to lead the AI to a non catastrophic result.
these models have hard time modifying LARGE files and then returning them back to you. That's inefficient too.
What you want is to ask for list of changes and then apply them. That's what aider, codex, etc. all do.
I made a tool to apply human-readable changes back to files, which you might find useful: https://github.com/asadm/vibemode
aider has this feature too.
Your first mistake was using Copilot. Your second mistake was using GPT 4o