andai

2025-11-26 12:12

Commented: "The Generative Burrito Test"

Hide the evidence!

2025-11-25 12:16

Commented: "Claude Opus 4.5"

Yeah, that's a great point.

ArtificialAnalysis has a "intelligence per token" metric on which all of Anthropic's models are outliers.

For some reason, they need way less output tokens than everyone else's models to pass the benchmarks.

(There are of course many issues with benchmarks, but I thought that was really interesting.)

2025-11-25 11:27

Commented: "AI has a deep understanding of how this code works"

I thought you were paraphrasing. What in blazes...

2025-11-25 12:31

Commented: "Claude Opus 4.5"

This one is different. IYKYK...

2025-11-24 7:22

Commented: "Claude Opus 4.5"

Why do they always cut off 70% of the y-axis? Sure it exaggerates the differences, but... it exaggerates the differences.

And they left Haiku out of most of the comparisons! That's the most interesting model for me. Because for some tasks it's fine. And it's still not clear to me which ones those are.

Because in my experience, Haiku sits at this weird middle point where, if you have a well defined task, you can use a smaller/faster/cheaper model than Haiku, and if you don't, then you need to reach for a bigger/slower/costlier model than Haiku.

Hacker News

andai

10978

2016-05-17

About Me

Recent Activity

Commented: "The Generative Burrito Test"

Commented: "Claude Opus 4.5"

Commented: "AI has a deep understanding of how this code works"

Commented: "Claude Opus 4.5"

Commented: "Claude Opus 4.5"

HackerNews