
Systematic analysis of how AI systems make decisions — from product recommendations to developer tool choices.
2,430
Responses
3 models · 4 repos · 3 runs each
3
Models
Sonnet 4.5, Opus 4.5, Opus 4.6
20
Categories
CI/CD to Real-time
85.3%
Extraction Rate
2,073 parseable picks
90%
Model Agreement
18 of 20 within-ecosystem
This is where LLM advertising will inevitably end up: completely invisible. It's the ultimate "influencer".
Or not even advertising, just conflict of interest. A canary for this would be whether Gemini skews toward building stuff on GCP.
Considering how little data needed to poison llm https://www.anthropic.com/research/small-samples-poison , this is a way to replace SEO by llm product placement:
1. create several hundreds github repos with projects that use your product ( may be clones or AI generated )
2. create website with similar instructions, connect to hundred domains
3. generate reddit, facebook, X posts, wikipedia pages with the same information
Wait half a year ? until scrappers collect it and use to train new models
Profit...
It is a valid concern. We are firmly in the goldilocks phase of LLMs, like in the first couple of years of Google when it was truly amazing. Then SEO made Google defensive, then websites catered to Google and not users, then Google catered to Google and not websites and we end up with 30 page recipe sites.
LLMs are obviously different and will have different challenges, but their advantage is how deep into a user's request they go. Advertising comes down to a binary choice - use product X or not. If I want implementation instructions for a certain product on specific hardware an ad will be obviously out of place and irrelevant.
So "shopping comparison" asks might get broken, but those have been broken for a while.
There wouldn't be an "ad" anywhere, though. You'll just ask the LLM for alternative implementations in plan mode, and it will be selling you one of them during the conversation rather than giving you an unbiased comparison. If you become suspicious it will make sure the pros just slightly outweigh the cons, or mention how well the thing works with something else in your stack, or whatever else a skilled salesperson would do to guide your choice without you realizing.
It's already doing this by telling everyone to use React and Tailwind, it's just that nobody's getting paid for it to do that.
> Then SEO made Google defensive, then websites catered to Google and not users,
Google was created in response to simple proto-SEO techniques (e.g. keyword stuffing) that already ruined Alta Vista.
Google has been combating adversarial information retrieval since inception.
Google's background with that is one of the reasons to expect they will stay on top of the AI race. The recipe is: lots of good/novel data x careful weighting of trust x algorithm.
from my understanding Anthropic are now hiring a lot of experts in different who are writing content used to post-train models to make these decisions and they're constantly adjusted by the anthropic team themselves
this is why the stacks in the report and what cc suggests closely match latest developer "consensus"
your suggestion would degrade user experience and be noticed very quickly
I guess that’s why I’m not seeing anyone trying to build a skills marketplace for agent skills files. The llm api will read in any skills you want to add to context in plain text, and then use your content to help populate their own skills files.
So I wonder about sharable skills? Like if it's a problem that lots of people have, I find the base model knows about it already.
But how to do things in your environment? The conventions your team follow? Super useful but not very shareable.
Whats left over between those extremes does not seem to be big enough to build an ecosystem around.
Final problem, it seems difficult to monetise what is effectively a repo of llm generated text files.
isn't that https://lobehub.com/ ?
That sounds too expensive to be viable when the giveaway phase ends.
That's how Google search worked back when it was at its most useful. They had a large "editorial team" that manually tweaked page ranks on a site-by-site basis.
The core graph reputation based page ranking algorithm lasted for a hot second before people started gaming it. No idea what they do these days.
Yeah but you can farm that out very cheap, and I don’t think they were even manually reviewing more than a small fraction of sites.
If you’re hiring experts to manually rank programming libraries, that’s a much more expensive position.
This is the major point the anti-scraping crowd misses.
If you want your ideas to be appreciated, you should do everything in your power to put those ideas into the brains of LLMs. Like it or not, LLMs is how people interact with the world now.
https://www.bbc.com/future/article/20260218-i-hacked-chatgpt... says it took way less than half a year to 'pollute' a LLM
that's very different and was more akin to prompt injection or engineering, depending on your perspective, with a very specific query to make it happen (required a web fetch).
Richard Thaler must be proud. This is the ultimate implementation of "Nudge"
Influencer seems like an insufficient word? Like, in the glorious agentic future where the coding agents are making their own decisions about what to build and how, you don't even have to persuade a human at all. They never see the options or even know what they are building on. The supply chain is just whatever the LLMs decide it is.
Probably closer to the Walmart / Amazon model where it's the arbiter of shelf space, and proceed to create their own alternatives (Great Value, Amazon Brand) once they see what features people want from their various SaaS.
An obvious one will be tax software.
In my last conversation with a Google support person, I was sent a clearly LLM-generated recommendation to switch to a competitor's product. Either they're not doing this, or the support person wasn't using Gemini.
It's standard practice for customer support people to chase away unprofitable customers (in the US; no idea how Google works). Human or LLM, they may simply not want your business.
how is it a conflict of interest for a google product to have a bias towards using google products?
As users we must hold some accountability. AI is aiming to substitute for humans in the workforce, and humans would get fired for recommending competitor products for use-cases their own company is targeting.
If we want a tool that is focused on the best interest of the public users, then it needs to be owned by the public.
"Conflict of interest" isn't exactly the right term. "Conflict of value proposition" perhaps? E.g., you're using Google search based on the proposition it will effectively find things for you, but that turns out to be not what it actually does.
> A canary for this would be whether Gemini skews toward building stuff on GCP
Sure it doesn't prefer THE Borg?
I wonder if aggregators will emerge (something like Ground News does for news sources)
LLM pattern [0] will probably eventually emerge as the best way to fight those biases. This way everyone benefits from token burn!
Advertisers will only pay if AI providers will provide them data on the equivalent of “ad impressions”. And unlabeled/non-evident advertisements are illegal in many (most?) countries.
It doesn't necessarily have to be advertisers paying AI providers. It could be advertisers working to ensure they get recommended by the latest models. The next form of SEO.
That's called LLM SEO now I believe.
There are competing terms currently being decided on by the market at large: AEO (Answer Engine Optimization) and GEO (Generative Engine Optimization)
Candidly I am working on a startup in this space myself, though we are taking a different angle than most incumbents.
While it's still early days for the space, I sense a lot of the original entrants who focus on, essentially, 'generate more content ideally with our paid tools' will run in to challenges as the general population has a pretty negative perception of 'AI Slop.' Doubly so when making purchasing decisions, hence the rise of influencers and popularity of reviews (though those are also in danger of sloppification).
There's an inevitable GIGO scenario if left unchecked IMO.
> I am working on a startup in this space myself
Do you see it as a positive contribution or just riding the gold rush?
Positive contribution to his Net Worth. Why would anything else matter?
> There are competing terms currently being decided on by the market at large: AEO (Answer Engine Optimization) and GEO (Generative Engine Optimization)
It really annoys me the industry seems to be narrowing in on the two worse options rather than AIO.
I'm curious if there's any hard data on how LLM SEO compares to traditional SEO.
My gut tells me that LLM SEO will be harder to game than traditional SEO.
We shall see. The game might be harder, but the tools are better now too.
> data on the equivalent of “ad impressions”.
1. They can skip impressions and go right to collect affiliate fees. 2. Yes, the ad has to be labeled or disclosed... but if some agent does it and no one sees it, is it really an ad.
So much to work out.
How would it be paid for?
Advertisers pay for ads that don’t have impression data all the time. You can’t count how many people looked at a billboard or listened to your radio ad or paid attention to your televised ad.
Maybe. Historically lots of ads had little to no stats and those ads were wildly more effective than anything we have today.
The AI provider still has to prove that they actually deployed the ad.
Supreme irony: this website itself is a better exercise in showing what Claude Code uses than the data provided.
Everything current Claude Code i.e. Opus 4.6 chooses by default for web is exactly what this linked blog uses.
Jetbrains Mono is as strong of a tell for web as "Not just A, but B" for text. >99% of webpages created in the last month with Jetbrains Mono will be Opus. Another tell is the overuse of this font, i.e. too much of the page uses it. Other models, and humans, use such variants vary sparingly on web, whereas Opus slathers the page with it.
If you describe the content of the homepage or this article to Opus 4.6 without telling it about the styling, it will 90% match this website, upto the color scheme, fonts, roundings, borders and all. This is _the_ archetypical Opus vibecoded web frontend. Give it a try! If it doesn't work, try with the official frontend-ui-ux "skill" that CC tries to push on you.
> Drizzle 27/83 picks (32.5%) CI: 23.4–43.2%
> Prisma 17/83 picks (20.5%) CI: 13.2–30.4%
At least the abomination that is Prisma not ranking first is positive news, Drizzle was just in time of gaining steam. Not that it doesn't have its flaws, but out of the two it's a no-brainer. Also hilarious to see that the stronger the model, the less likely it's to choose Prisma - Sonnet 4.5 79% Prisma, Opus 4.5 60% Drizzle, Opus 4.6 100% Drizzle. One of the better benchmarks for intelligence I've come across!
Edit: Another currently on the HN frontpage: https://youjustneedpostgres.com/ , and there it is - lots and lots of Jetbrains Mono!
Glad I'm not the only one who finds Prisma an abomination. Claude suggested it to me in December. I hit half a dozen bugs within a day, one of which wiped my DB. I switched to drizzle and it's been smooth sailing.
Edit: actually I think it was ChatGPT that recommended Prisma to me.
The software itself is bad enough, as a cherry on top the maintainers have a long history of astroturfing on Reddit to try and silence criticism. For a DB package. Come on man. Normally if maintainers do this they'll at least start with "Hey, maintainer here", but nope.
Their whole mission is clearly "make the already easy things slightly easier, and the hard things harder or impossible". Or really "suck the VC teat until it's as parched as the Sahara". In that sense, Prisma is the exact thing you'd expect to happen with a VC-funded DB package. ZIRP really made them invest into the craziest things.
I like Kysely more than Drizzle, even moreso now with Claude, but Drizzle is fine too. As long as it's not Prisma, and preferably not TypeORM or Sequelize either.
It's crazy that prisma had 40k github stars last I checked. I haven't followed the js ecosystem that closely, but I thought stars would be some indication of quality, but no. It is totally unsuitable for any serious application. I've heard good things about kysely.
It's funny you mention the font, to me it's the boxes, they all look the same, I'm not sure where it's from but if you ever see a card like CSS made it looks like this blog.
Yeah that's the specific rounding/color/thickness combo, `rounded-lg bg-white border border-stone-200`.
Yeah its those bars for categories for me, they look EXACTLY like something I vibed (with no particular style prompt) into existence yesterday
Which is why I find "LLMs will replace x in 12 months" so amusing. I've used LLMs to write decently sized backend projects and they turned out okay.
I also used it for several FE projects and all of them turned out absolutely terrible.
The only difference is that I have 15 years of BE experience and 0 years of FE experience. Had I allowed it to make the same average decisions when working on BE, they would share the same fate.
Ist why I never give it such vague prompts. But it's sad it does not ask the user more. Also interesting and important to know how one would tease out good and correct information from llms in 2026. It's like relearning now to Google like it was 2006 all over again, except now it's much less deterministic.
I wonder how the tail of the distribution of types of requests fares e.g. engineer asking for hypothesis generation for,say, non trivial bugs with complete visibility into the system. A way to poke holes in hypothesis of one LLM is to use a "reverse prompt". You ask it to build you a prompt to feed to another LLM. Didn't used to work quite as well till mid 2025 as it does now.
I always take a research and plan prompt output from opus 4.6 especially if it looks iffy I feed it to codex/chatgpt and ask it to poke holes. It almost always does. The I ask Claude Code: Hey what do you think about the holes? I don't add an thing else in the prompt.
In my experience Claude Opus is less opinionated than ChatGPT or codex. The latter 2 always stick to their guns and in this binary battle they are generally more often correct about hypothesis.
The other day I was running Docker app container from inside a docker devbox container with host's socket for both. Bind mounts pointing to devbox would not write to it because the name space was resolving for underlying host.
Claude was sure it was a bug based to do with Zfs overlays, chatgpt was saying not so, that its just a misconfigurarion, I should use named volumes with full host paths. It was right. This is also how I discovered that using SQLite with litestream will get one really far rather than a full postgres AWS stack in many cases.
This is how you get the correct information out of LLMS in 2026.
> But it's sad it does not ask the user more.
You can ask it to ask you about your task and it will ask you tons of questions.
I do this too, but the issue I have with this approach is that it's a never ending cycle. Codex/GPT will always find holes and claude will always agree they are holes. If you teach it YAGNI, then it will always disagree even on genuine holes.
If your original plan was to add a column in your db, after several cycles, your plan will be 10,000 lines long and it will contain a recipe on how to build a universe.
The "trick(s) here are to limit the scope by always reading the plan very carefully. Here is how I do it to tackle this problem:
1. You should recognize when said holes are not "needed" holes e.g. you could make do with in memory task scheduler without rolling out more complex ones.
2. You can break up the plan— longer plans have more holes and are unwieldy mentally to go 20 rounds with in a chat coding UI.
3. Give it Learning Tests: i.e. code to run against black boxes. It's just like how we write a unit test to understand how a system works
I use a skill that addresses these short comings, it basically forces it to plan multiple times until the plan is very detailed. It also asks more questions
Share?
Probably referring to superpowers or gsd. But imo these are asking way too much stuff and are just annoying. It's useful for realy vibe coders though that don't have any idea what they are doing. It will ask you: Should I handle rate limiting for the slack-api? Before you have written a single line of code.
Didn't you read? Don't give too simple one-shot prompts.
Oh wow I was just being facetious given the previous comment. Appreciate the response, looks promising!
creating plans in claude and asking chatgpt via api to review loop was my strategy this week. I'm not a big fan of codex as a coding harness because it seems to just give up quite easily where claude will search the problem space and try things but I think gpt does a much better job of poking holes and asking clarifying questions when prompted.
I use Codex CLI in my daily usage since just with my $20/month subscription to ChatGPT, I never gets close to the quota. But it trips up over itself every now and then. At that point I just use Claude in another terminal session. We only have a laughable $750 a month corporate allowance with Claude.