sailingparrot

2025-11-12 6:09

Commented: "The last-ever penny will be minted today in Philadelphia"

The unbearable pain of having to handle bills of different sizes, there is not enough empathy in this world to truly pay hommage to your suffering.

2025-10-22 2:49

Commented: "BERT is just a single text diffusion step"

At no point have I argued that LLMs aren’t autoregressive, I am merely talking about LLMs ability to reason across time steps, so it seems we are talking past each other which won’t lead anywhere.

And yes, LLM can be studied under the lens of Markov processes: https://arxiv.org/pdf/2410.02724

Have a good day

2025-10-21 7:51

Commented: "LLMs can get "brain rot""

> If it conveys the intended information then what's wrong with that?

Well, the issue is precisely that it doesn’t convey any information.

What is conveyed by that sentence, exactly ? What does reframing data curation as cognitive hygiene for AI entails and what information is in there?

There are precisely 0 bit of information in that paragraph. We all know training on bad data lead to a bad model, thinking about it as “coginitive hygiene for AI” does not lead to any insight.

LLMs aren’t going to discover interesting new information for you, they are just going to write empty plausible sounding words. Maybe it will be different in a few years. They can be useful to help you polish what you want to say or otherwise format interesting information (provided you ask it to not be ultra verbose), but its just not going to create information out of thin air if you don't provide it to it.

At least, if you do it yourself, you are forced to realize that you in fact have no new information to share, and do not waste your and your audience time by publishing a paper like this.

2025-10-21 7:32

Commented: "BERT is just a single text diffusion step"

> entirely embedded in this sequence.

Obviously wrong, as otherwise every model would predict exactly the same thing, it would not even be predicting anymore, simply decoding.

The sequence is not enough to reproduce the exact output, you also need the weights.

And the way the model work is by attending to its own internal state (weights*input) and refining it, both across the depth (layer) dimension and across the time (tokens) dimension.

The fact that you can get the model to give you the exact same output by fixing a few seeds, is only a consequence of the process being markovian, and is orthogonal to the fact that at each token position the model is “thinking” about a longer horizon than the present token and is able to reuse that representation at later time steps

2025-10-21 3:33

Commented: "LLMs can get "brain rot""

train on bad data, get a bad model

Hacker News

sailingparrot

4892

2014-10-17

About Me

Recent Activity

Commented: "The last-ever penny will be minted today in Philadelphia"

Commented: "BERT is just a single text diffusion step"

Commented: "LLMs can get "brain rot""

Commented: "BERT is just a single text diffusion step"

Commented: "LLMs can get "brain rot""

HackerNews