halcyon.cameo_3y@icloud.com
At no point have I argued that LLMs aren’t autoregressive, I am merely talking about LLMs ability to reason across time steps, so it seems we are talking past each other which won’t lead anywhere.
And yes, LLM can be studied under the lens of Markov processes: https://arxiv.org/pdf/2410.02724
Have a good day
> If it conveys the intended information then what's wrong with that?
Well, the issue is precisely that it doesn’t convey any information.
What is conveyed by that sentence, exactly ? What does reframing data curation as cognitive hygiene for AI entails and what information is in there?
There are precisely 0 bit of information in that paragraph. We all know training on bad data lead to a bad model, thinking about it as “coginitive hygiene for AI” does not lead to any insight.
LLMs aren’t going to discover interesting new information for you, they are just going to write empty plausible sounding words. Maybe it will be different in a few years. They can be useful to help you polish what you want to say or otherwise format interesting information (provided you ask it to not be ultra verbose), but its just not going to create information out of thin air if you don't provide it to it.
At least, if you do it yourself, you are forced to realize that you in fact have no new information to share, and do not waste your and your audience time by publishing a paper like this.
> entirely embedded in this sequence.
Obviously wrong, as otherwise every model would predict exactly the same thing, it would not even be predicting anymore, simply decoding.
The sequence is not enough to reproduce the exact output, you also need the weights.
And the way the model work is by attending to its own internal state (weights*input) and refining it, both across the depth (layer) dimension and across the time (tokens) dimension.
The fact that you can get the model to give you the exact same output by fixing a few seeds, is only a consequence of the process being markovian, and is orthogonal to the fact that at each token position the model is “thinking” about a longer horizon than the present token and is able to reuse that representation at later time steps
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io