My email is ROT13 encoded: shaalni@tznvy.pbz
Yeah, that's a great point.
ArtificialAnalysis has a "intelligence per token" metric on which all of Anthropic's models are outliers.
For some reason, they need way less output tokens than everyone else's models to pass the benchmarks.
(There are of course many issues with benchmarks, but I thought that was really interesting.)
Why do they always cut off 70% of the y-axis? Sure it exaggerates the differences, but... it exaggerates the differences.
And they left Haiku out of most of the comparisons! That's the most interesting model for me. Because for some tasks it's fine. And it's still not clear to me which ones those are.
Because in my experience, Haiku sits at this weird middle point where, if you have a well defined task, you can use a smaller/faster/cheaper model than Haiku, and if you don't, then you need to reach for a bigger/slower/costlier model than Haiku.
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io