LLMs work best when the user defines their acceptance criteria first

Comments

By pornel 2026-03-072:2521 reply

Their default solution is to keep digging. It has a compounding effect of generating more and more code.

If they implement something with a not-so-great approach, they'll keep adding workarounds or redundant code every time they run into limitations later.

If you tell them the code is slow, they'll try to add optimized fast paths (more code), specialized routines (more code), custom data structures (even more code). And then add fractally more code to patch up all the problems that code has created.

If you complain it's buggy, you can have 10 bespoke tests for every bug. Plus a new mocking framework created every time the last one turns out to be unfit for purpose.

If you ask to unify the duplication, it'll say "No problem, here's a brand new metamock abstract adapter framework that has a superset of all feature sets, plus two new metamock drivers for the older and the newer code! Let me know if you want me to write tests for the new adapters."

By treetalker 2026-03-0715:306 reply

This is my experience with how LLMs "draft" legal arguments: at first glance, it's plausible — but may be, and often is, invalid, unsound, and/or ill-advised.

The catch is that many judges lack the time, energy, or willingness to not only read the documents in detail, but also roll up their sleeves and dig into the arguments and cited authorities. (Some lack the skills, but those are extreme cases.) So the plausible argument (improperly and unfortunately) carries the day.

LLM use in litigation drafting is thus akin to insurgent/guerilla warfare: it take little time, energy, or thinking to create, yet orders of magnitude more to analyze and refute. (It's a species of Brandolini's Law / The Bullshit Asymmetry Principle.) Thus justice suffers.

I imagine that this is analogous to the cognitive, technical, and "sub-optimal code" debt that LLM-produced code is generating and foisting upon future developers who will have to unravel it.

By grey-area 2026-03-078:192 reply

This is a fascinating look into code generated by an LLM that is correct in one sense (passes tests) but doesn't meet requirements (painfully slow). Doesn't use is_ipk to identify primary keys, uses fsync on every statement. The problem with larger projects like this even if you are competent is that there are just too many lines of code to read it properly and understand it all. Bravo to the author for taking the time to read this project, most people never will (clearly including the author of it).

I find LLMs at present work best as autocomplete -

The chunks of code are small and can be carefully reviewed at the point of writing

Claude normally gets it right (though sometimes horribly wrong) - this is easier to catch in autocomplete

That way they mostly work as designed and the burden on humans is completely manageable, plus you end up with a good understanding of the code generated. They make mistakes I'd say 30% of the time or so when autocompleting, which is significant (mistakes not necessarily being bugs but ugly code, slow code, duplicate code or incorrect code.

Having the AI produce the majority of the code (in chats or with agents) takes lots of time to plan and babysit, and is harder to review, maintain and diagnose; it doesn't seem like much of a performance boost, unless you're producing code that is already in the training data and just want to ignore the licensing of the original code.

Hacker News

LLMs work best when the user defines their acceptance criteria first

Show article

dnw

Comments

By pornel 2026-03-072:2521 reply

By treetalker 2026-03-0715:306 reply

By grey-area 2026-03-078:192 reply

HackerNews