Email: btown.hn@gmail.com
In a forum/community context, speed is vital! If it takes an order of magnitude more time to generate responses like yours and mine, one must choose which conversations one participates in much more carefully, and every such investment risks having the context of the conversation shift dramatically while drafting a response - to the point that one might be considered rude or disconnected. That makes participation essentially impossible.
Someone with a slower rate of both reading and creating text would benefit less from LLM assistance, to be sure. But someone who can read quickly, but may only be able to generate/select a few bits of entropy per second due to physical limitations? (Human speech is widely cited at a median of 39 bits per second.) They’d benefit massively from a system that could generate proposed responses that could be chosen from and refined.
In other words, if you’re the oracle, and the machine asks multiple choice questions until it is certain it speaks with your voice - is there a better set of such questions than just letter-by-letter a-z, a-z, a-z? Does that imply the content is AI-edited? Or is it an accessibility tool?
This seems way cooler than just computation (which is easy to hand off to a tool, and arguably more predictable that way). The broader point here is that you can have your model switch dynamically to/from a kind of attention that scales with the log of the token count, by only exploring the convex hull in a 2D space. A less capable version of attention, to be sure, but one capable of tracing a program’s execution with text representations of registers and stack - which is a meaningful level of flexibility, and one many humans would find difficult to do reliably!
What could you do with an LLM that can go into “focus mode” and generate tokens extremely rapidly? How much more powerful would a reasoning-token-generation phase be that can explore and cull large numbers of paths/hypotheses, so long as they are well defined? Does this have implications for multi-modal models and spatial reasoning?
As the paper suggests:
> These models could be useful in several modes: as a dedicated fast path paired with a slower, more general model; as part of a fast/slow hybrid architecture inside a single system; or as a speculative execution model that proposes tokens quickly while a regular-attention model verifies and accepts them. Regardless of their eventual capability ceiling, they already suggest a powerful systems primitive for speeding up larger models.
For all the challenges that AI poses to online communities, it does allow people for whom typing and dictation are painful, difficult, or impossible, to participate in those communities in ways they never could before.
I think HN is broadly supportive of these voices, and I think that an "unwritten exception" to this rule is implicit here. But I'm in the camp that making an explicit exception for special circumstances would be a meaningful statement that all voices are welcome.
Great to see innovation in this space!
If I could make one giant request, it's around giving (properly authorized) humans the ability to override the system when needed. When you make a simple API, it's all too common for a company integrating the solution to rely entirely on the identity service's yes-no outcome. But all too commonly, there's no way to override a decision, or bypass the need for identification.
In the travel space, I've seen situations, especially with luxury and celebrity clients, where there's human levels of trust across the board, all parties are agreed at senior levels that they'd like to fulfill with a one-off exception to identity verification... but the technology refuses to let them proceed without going through the full verification flow, and if they're integrated in the simplest way, there's no "escape hatch" on the integration's side.
And similarly, if a person happens to trigger false negatives on video matches (say, due to medical reasons) giving support teams an ability to build exceptions is key. Having a way to tell the system "for this transaction/account ID, when they get to this node in the flow, let them through as if checks proceeded, or treat them as pre-authorized" would set you apart.
(Obviously, for things involving KYC, there's a lot of considerations around permissioning - but for many use cases, you want to empower senior support teams.)
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io