Cofounder of ApolloAgriculture (https://apolloagriculture.com). We help smallholder farmers in subsaharan africa maximize their profits, and we're good at it.
Oregonian lost in Amsterdam.
earl at apolloagriculture dot com
[ my public key: https://keybase.io/estsauver; my proof: https://keybase.io/estsauver/sigs/2MlvHN12O0vg54og9F5h0xbq6eNMlONSt2O3AdFfV94 ]
They're certainly welcome to do whatever they're like, and for a microkernel based OS it might make sense--I think there's probably pretty "Meh" output from a lot of LLMs.
I think part of the battle is actually just getting people to identify which LLM made it to understand if someones contribution is good or not. A javascript project with contributions from Opus 4.6 will probably be pretty good, but if someone is using Mistral small via the chat app, it's probably just a waste of time.
I have a recurring problem where I can't even read one of my favorite recipe websites (seriouseats.com) from my phone because the series of popups completely blocks the page, and can't be dismissed.
But if I ask Claude or Gemini for a nice version of the recipe, it works perfectly. I think there's a lot of own goals out there.
For what it's worth, most people already are doing this! Some of the subagents in Claude Code (Explore, I think even compaction) default to Haiku and then you have to manually overwrite it with an env variable if you want to change it.
Imagine the quality of life upgrade of getting compaction down to a few second blip, or the "Explore" going 20 times faster! As these models get better, it will be super exciting!
I think the fast inference options have historically been only marginally more expensive then their slow cousins. There's a whole set of research about optimal efficiency, speed, and intelligence pareto curves. If you can deliver even an outdated low intelligence/old model at high efficiency, everyone will be interested. If you can deliver a model very fast, everyone will be interested. (If you can deliver a very smart model, everyone is obviously the most interested, but that's the free space.)
But to be clear, 1000 tokens/second is WAY better. Anthropic's Haiku serves at ~50 tokens per second.
Do you guys all think you'll be able to convert open source models to diffusion models relatively cheaply ala the d1 // LLaDA series of papers? If so, that seems like an extremely powerful story where you get to retool the much, much larger capex of open models into high performance diffusion models.
(I can also see a world where it just doesn't make sense to share most of the layers/infra and you diverge, but curious how you all see the approach.)
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io