Ask HN: Why there are no actual studies that show AI is more productive?

Comments

By AugustoCAS 2026-03-0810:024 reply

Dora released a report last year: https://dora.dev/research/2025/dora-report/

The gains are ~17% increase in individual effectiveness, but a ~9% of extra instability.

In my experience using AI assisted coding for a bit longer than 2 years, the benefit is close to what Dora reported (maybe a bit higher around 25%). Nothing close to an average of 2x, 5x, 10x. There's a 10x in some very specific tasks, but also a negative factor in others as seemingly trivial, but high impact bugs get to production that would have normally be caught very early in development on in code reviews.

Obviously depends what one does. Using AI to build a UI to share cat pictures has a different risk appetite than building a payments backend.

By lucasluitjes 2026-03-0810:162 reply

The full report can be found here: https://services.google.com/fh/files/misc/2025_state_of_ai_a...

That 17% increase is in self-reported effectiveness. The software delivery throughput only went up 3%, at a cost of that 9% extra instability. So you can build 3% faster with 9% more bugs, if I'm reading those numbers right.

By yorwba 2026-03-0811:10

Those aren't even percentage increases, but standardized effect sizes. So if you take an individual survey respondent and all you know is that they self-reported higher AI usage, you can guess their answers to the self-reported individual effectiveness slightly more accurately, but most of the variation will be due to unrelated factors.

The question that people are actually interested in, "After adopting this specific AI tool, will there be a noticeable impact on measures we care about?" is not addressed by this model at all, since they do not compare individual respondents' answers over time, nor is there any attempt to establish causality.

By PunchyHamster 2026-03-0812:08

And 3% difference is at "the new coffee in office is kinda shit and developers are annoyed" level of difference

By orwin 2026-03-0814:11

I think for myself, it's close to 25% if I only take my role as a dev. If I take my 'senior' role it's less, because I spend way more time in reviews or in prod incident meetings.

Three months ago, with opus4.5, I would have said that the productivity improvement was ~10% for my whole team.

I now have to contradict myself: juniors and even experienced new hires with little domain knowledge don't improve as fast as they used to. I still have to write new tasks/issue like I would have for someone we just hired, after 8 months. I still catch the same issues we caught in reviews three months ago.

Basically, experience doesn't improve productivity as fast as it used to. On easy stuff it doesn't matter (like frontend changes, the productivity gains are extremely high, probably 10x), and on specific subjects like red teaming where a quantity of small tools is better than an integrated solution I think it can be better than that.

But I'm in a netsec tooling team, we do hard automation work to solve hard engineering issues, and that is starting to be a problem if juniors don't level up fast.

By unsupp0rted 2026-03-0810:221 reply

For me it is a 2x or 5x or something, "but high impact bugs get to production that would have normally be caught very early in development on in code reviews" is what takes it back down to a 1.5x.

There are genuinely weeks where I go 5x though, and others where I go 0.5x.

By duncanfwalker 2026-03-0810:451 reply

It's not so valuable to assess the current state - what the impact of using AI is today. From personal experience it feels like overall impact on productivity was not positive a couple of years ago, might be positive now and will be positive in a couple of years. That means by assessing the current state of impact on product where just finding where we are on that change curve. If we accept that trend is happening then we know at some point it will (or has) pass the threshold where our companies will fall behind if they're not using it. We also know it takes a while to get up to speed and make sure we're making the most of it so the earlier we start the better. That's the counter arguement that we could wait for a later wave to jump on but that's risky and the only potential reward is a small percentage short-term productivity gain.

By muvlon 2026-03-0810:591 reply

So you're saying instead of assessing the current capabilities of the technology, we should imagine its future capabilities, "accept" that they will surely be achieved and then assess those?

By AnimalMuppet 2026-03-0819:17

Of course, if stability is part of what you're supposed to be delivering, then you can't be 17% more effective.

By chrysoprace 2026-03-0810:282 reply

Self-reported productivity does not equate to actual productivity. People have all sorts of biases that make such assessments fairly pointless. They only gauge how you feel about your productivity, which is not necessarily a bad thing, but it doesn't mean you're actually more productive.

By esperent 2026-03-0810:42

To extend on this, the measures of productivity before LLMs were difficult for any kind of complex work, so there's no reason to think we would have better measures now.

You need broad economic measurements, not individual or company specific. And that takes a long time plus there's a lot of noise in the data right now (war, for example).

By lysecret 2026-03-0810:224 reply

Because we are incapable of measuring developer productivity.

By arzke 2026-03-0810:49

We're incapable of putting an accurate, standardized value on developer productivity, yet there often seems to be consensus between senior engineers on who are the high performers and the low performers. I certainly can tell this about the people I work with.

By kqr 2026-03-0810:471 reply

We are definitely not. Point at a problem, and measure the cost of solving it. That's developer productivity.

We only avoid doing it at scale because it's expensive. In particular if we want the measurement to generalise out of sample.

(In particular in this case, where once we're done, proponents will claim our data is too old to be a useful guide to tomorrow.)

By xigoi 2026-03-0812:351 reply

> Point at a problem, and measure the cost of solving it.

The problem with this is that AI will create worse code that is going to cause more problems in the future, but the measurements won’t take that into account.

By kqr 2026-03-0821:051 reply

The measurements should take that into account, yes. (There are ways to estimate this.)

By actionfromafar 2026-03-0810:42

Yes.

If we could even measure teams, against themselves, others and some kind of baseline, but we don't AFAIK.

By blitzar 2026-03-0810:531 reply

Lines of code pushed ... obviously /s

Unironically, ai evaluating the impact of those lines might be getting close to a metric that would measure output better than having everyone print out their last 6 months of work for the new boss to look at.

By PunchyHamster 2026-03-0812:10

Or it might be horribly bad at it, as near every other problem people claim "AI might be good at it"

Hacker News