Since I love collecting questionable analogies for LLMs, here's a new one I just came up with: an LLM is a lossy encyclopedia. They have a huge array of facts …
Since I love collecting questionable analogies for LLMs, here's a new one I just came up with: an LLM is a lossy encyclopedia. They have a huge array of facts compressed into them but that compression is lossy (see also Ted Chiang).
The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters.
This thought sparked by a comment on Hacker News asking why an LLM couldn't "Create a boilerplate Zephyr project skeleton, for Pi Pico with st7789 spi display drivers configured". That's more of a lossless encyclopedia question!
My answer:
The way to solve this particular problem is to make a correct example available to it. Don't expect it to just know extremely specific facts like that - instead, treat it as a tool that can act on facts presented to it.
I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.
> The way to solve this particular problem is to make a correct example available to it.
My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.
> the user will at least need to know something about the topic beforehand.
I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"
It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.
These are the kind of innocent errors that can be dangerous if users trust it blindly.
The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?
What if you had told it again that you don't think that's right? Would it have stuck to it's guns and went "oh, no, I am right here" or would it have backed down and said "Oh, silly me, you're right, here's the real dosage!" and give you again something wrong?
I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?
I replied in the same thread "Are you sure that sounds like a low dose". It stuck to the (correct) recommendation in the 2nd response, but added in a few use cases for higher doses. So seems like it stuck to its guns for the most part.
For things like this, it would definitely be better for it to act more like a search engine and direct me to trustworthy sources for the information rather than try to provide the information directly.
I noticed this recently when I saw someone post with an AI generated map of Europe which was all wrong. I tried the same and asked ChatGPT to generate a map of Ireland and it was wrong too. So then I asked to find me some accurate maps of Ireland and instead of generating it gave me images and links to proper websites.
Will definitely be remembering to put "generate" vs "find" in my prompts depending on what I'm looking for. Not quite sure how you would train the model to know which answer is more suitable.
My mom was looking up church times in the Philippines. Google AI was wrong pretty much every time.
Why is an LLM unable to read a table of church times across a sampling of ~5 Filipino churches?
Google LLM (Gemini??) was clearly finding the correct page. I just grabbed my mom's phone after another bad mass time and clicked on the hyperlink. The LLM was seemingly unable to parse the table at all.
Because google search and llm teams are different, with different incentives. Search is the cash cow they keep squeezing for more cash at the expense of good quality since at least 2018, as revealed in court documents showing they did that on purpose to keep people searching more to have more ads and more revenue. Google AI embedded in search has the same goals, keep you clicking on ads… my guess would be Gemini doesn’t have any of the bad part of enshitification yet… but it will come. If you think hallucinations are bad now, just you wait until tech companies start tuning them up on purpose to get you to make more prompts so they can inject more ads!
And one that likely happens often.
> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.
This use case is bad by several degrees.
Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.
What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.
I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.
No, not Mayo clinic or any other site (unless it's a pretty generic medicine).
This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.
The compound I was researching was [edit: removed].
Problem is it's not FDA approved, only prescribed by compounding pharmacies off label. Experimental compound with no official guidelines.
The first result on Google for "[edit: removed] dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.
Edit: Jeesh, what’s with the downvotes?
> Experimental compound with no official guidelines.
> The first result on Google for "GHK-Cu dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.
You're making my point even more. When doing off label for an unapproved drug, you probably should not trust anything on the Internet. And if there is a reliable source out there on the Internet, it's very much on you to be able to discern what is and what is not reliable. Who cares that the LLM is wrong, when likely much of the Internet is wrong?
BTW, I'm not advocating that LLMs are good for stuff like this. But a better example would be asking the LLM "In my state, is X taxable?"
The Google AI summary was completely wrong (and the helpful link it used as a reference was correct, and in complete disagreement with the summary). But other than the AI summary being wrong, pretty much every link in the Google search results was correct. This is a good use case for not relying on an LLM: Information that is widely and easily available is wrong in the LLM.
> You're making my point even more
What exactly is your point?
Is your point that I should be smarter and shouldn’t have asked ChatGPT the question?
If that’s your point, understood, but I don’t think you can assume the average ChatGPT user will have such a discerning ability to determine when and when not using a LLM is appropriate.
FWIW I agree with you. But the “you shouldn’t ask ChatGPT that question” is a weak argument if you care about contextualizing and broadening your point beyond me and my specific anecdote.
My point is that if you're trying to demonstrate how unreliable LLMs are, this is a poor example, because the alternatives are almost equally poor.
> If that’s your point, understood, but I don’t think you can assume the average ChatGPT user will have such a discerning ability to determine when and when not using a LLM is appropriate.
I agree that the average user will not, but they also will not have the ability to determine that the answer from the top (few) Google links is invalid as well. All you've shown is the LLM is as bad as Google search results.
Put another way, if you invoke this as a reason one should not rely on LLMs (in general), then it follows one should not rely on Google either (in general).
I think this actually points at a different problem, a problem with LLM users, but only to the extent that it's a problem with people with respect to any questions they have to ask any source they consider an authority at all. No LLM, nor any other source on the Internet, nor any other source off the Internet, can give you reliable dosage guidelines for copper peptides because this is information that is not known to humans. There is some answer to the question of what response you might expect and how that varies by dose, but without the clinical trials ever having been conducted, it's not an answer anyone actually has. Marketing and popular misconceptions about AI lead to people expecting it to be able to conjure facts out of thin air, perhaps reasoning from first principles using its highly honed model of human physiology.
It's an uncomfortable position to be in trying to biohack your way to a more youthful appearance using treatments that have never been studied in human trials, but that's the reality you're facing. Whatever guidelines you manage to find, whether from the telehealth clinic directly, or from a language model that read the Internet and ingested that along with maybe a few other sources, are generally extrapolated from early rodent studies and all that's being extrapolated is an allometric scaling from rat body to human body of the dosage the researchers actually gave to the rats. What effect that actually had, and how that may or may not translate to humans, is not usually a part of the consideration. To at least some extent, it can't be if the compound was never trialed on humans.
You're basically just going with scale up a dosage to human sized that at least didn't kill the rats. Take that and it probably won't kill you. What it might actually do can't be answered, not by doctors, not by an LLM, not by Wikipedia, not by anecdotes from past biohackers who tried it on themselves. This is not a failure of information retrieval or compression. You're just asking for information that is not known to anyone, so no one can give it to you.
If there's a problem here specific to LLMs, it's that they'll generally give you an answer anyway and will not in any way quantify the extent to which it is probably bullshit and why.
> a problem with LLM users
I think the flaw here is placing blame on users rather than the service provider.
HN is cutting LLM companies slack because we understand the technical limitations making it hard for the LLM to just say “I don’t know”.
In any other universe, we would be blaming the service rather than the user.
Why don’t we fix LLMs so they don’t spit out garbage when it doesn’t know the answer. Have we given up on that thought?
> In any other universe, we would be blaming the service rather than the user.
I think the key question is "How is this service being advertised?"
Perhaps the HN crowd gives it a lot of slack because they ignore the advertising. Or if you're like me, aren't even aware of how this is being marketed. We know the limitations, and adapt appropriately.
I guess where we differ is on whether the tool is broken or not (hence your use of the word "fix"). For me, it's not at all broken. What may be broken is the messaging. I don't want them to modify the tool to say "I don't know", because I'm fairly sure if they do that, it will break a number of people's use cases. If they want to put a post-processor that filters stuff before it gets to the user, and give me an option to disable the post-processor, then I'm fine with it. But don't handicap the tool in the name of accuracy!
The point you were making elsewhere in the thread was that "this is a bad use case for LLMs" ... "Don't use LLMs for dosing guidelines." ... "Using dosing guidelines is a bad example for demonstrating how reliable or unreliable LLMs are", etc etc etc.
You're blaming the user for having a bad experience as a result of not using the service "correctly".
I think the tool is absolutely broken, considering all of the people saying dosing guidelines is an "incorrect" use of LLM models. (While I agree it's not a good use, I strongly dislike how you're blaming the user for using it incorrectly - completely out of touch with reality).
We can't just cover up the shortfalls of LLMs by saying things like "Oh sorry, that's not a good use case, you're stupid if you use the tool for that purpose".
I really hope the HN crowd stops making excuses for why it's okay that LLMs don't perform well on tasks it's commonly asked to do.
> But don't handicap the tool in the name of accuracy!
If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.
If LLMs were only used by smarter-than-average HN-crowd techies, I'd agree. But we're talking about a technology used by middle school kids. I don't think it's reasonable to expect middleschoolers to know what they should and shouldn't ask LLMs for help with.
> You're blaming the user for having a bad experience as a result of not using the service "correctly".
Definitely. Just as I used to blame people for misusing search engines in the pre-LLM era. Or for using Wikipedia to get non-factual information. Or for using a library as a place to meet with friends and have lunch (in a non-private area).
If you're going to try to use a knife as a hammer, yes, I will fault you.
I do expect that if someone plans to use a tool, they do own the responsibility of learning how to use it.
> If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.
Documentation, manuals, training videos, etc.
Yes, I am perhaps a greybeard. And while I do like that many modern parts of computing are designed to be easy to use without any training, I am against stating that this is a minimum standard that all tools have to meet.
Software is the only part of engineering where "self-explanatory" seems to be common. You don't buy a board game hoping it will just be self-evident how to play. You don't buy a pressure cooker hoping it will just be safe to use without learning how to use it.
So yes, I do expect users should learn how to use the tools they use.
Current frontier LLMs - Claude 4, GPT-5, Gemini 2.5 - are massively more likely to say "I don't know" than last year's models.
I don’t think I’ve ever seen ChatGPT 5 refuse to answer any prompt I’ve ever given it. I’m doing 20+ chats a day.
What’s an example prompt where it will say “idk”?
Edit: Just tried a silly one, asking it to tell me about the 8th continent on earth, which doesn’t exist. How difficult is it for the model to just say “sorry, there are only 7 continents”. I think we should expect more from LLMs and stop blaming things on technical limitations. “It’s hard” is getting to be an old excuse considering the amount of money flowing into building these systems.
https://chatgpt.com/share/68b85035-62ec-8006-ab20-af5931808b... - "There are only seven recognized continents on Earth: Africa, Antarctica, Asia, Australia, Europe, North America, and South America."
Here's a recent example of it saying "I don't know" - I asked it to figure out why there was an octopus in a mural about mushrooms: https://chatgpt.com/share/68b8507f-cc90-8006-b9d1-c06a227850... - "I wasn’t able to locate a publicly documented explanation of why Jo Brown (Bernoid) chose to include an octopus amid a mushroom-themed mural."
Not sure what your system prompt is, but asking the exact same prompt word for word for me results in a response talking about "Zealandia, a continent that is 93% submerged underwater."
The 2nd example isn't all that impressive since you're asking it to provide you something very specific. It succeeded in not hallucinating. It didn't succeed at saying "I'm not sure" in the face of ambiguity.
I want the LLM to respond more like a librarian: When they know something for sure, they tell you definitively, otherwise they say "I'm not entirely sure, but I can point you to where you need to look to get the information you need."
I'm using regular GPT-5, no custom instructions and memory turned off.
Can you link to your shared Zealandia result?
I think that mural result was spectacularly impressive, given that it started with a photo I took of the mural with almost no additional context.
I can't link since it's in an enterprise account.
Interestingly I tried the same question in a separate ChatGPT account and it gave a similar response you got. Maybe it was pulling context from the (separate) chat thread where it was talking about Zealandia. Which raises another question: once it gets something wrong once, will it just keep reenforcing the inaccuracy in future chats? That could lead to some very suboptimal behavior.
Getting back on topic, I strongly dislike the argument that this is all "user error". These models are on track to be worth a trillion dollars at some point in the future. Let's raise our expectations of them. Fix the models, not the users.
I wonder if you're stuck on an older model like GPT-4o?
EDIT: I think that's likely what is happening here: I tried the prompt against GPT-4o and got this https://chatgpt.com/share/68b8683b-09b0-8006-8f66-a316bfebda...
My consistent position on this stuff is that it's actually way harder to use than most people (and the companies marketing it) let on.
I'm not sure if it's getting easier to use over time either. The models are getting "better" but that partly means their error cases are harder to reason about, especially as they become less common.
LANGUAGE model, not FACT model.
I gave LLM a list of python packages and asked it to give me their respective licenses. Obviously it got some of them wrong. I had to manually check with the package's pypi page.
Ok?
Yes. That is the problem; that sometimes it works. See the topic. Adding RAG or web search capability limits the loss and hallucinations.
Yes. You always need to check the results. Your task by the way is better for an agentic AI system that can web search, get, and double check results.
his task is probably best done with a script, heck you could thell chatgpt to download all packages with a script and check LICENSE files and report back with a csv/table
[dead]
"The main challenge is LLMs aren't able to gauge confidence in its answers"
This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.
I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.
> but there does seem to be
I need to stop you right there! These machinations are very good at seeming to be! The behavior is random, sometimes it will be in a high dimensional subspace of refusing to change its mind, others it is a complete sycophant with no integrity. To test your hypothesis that it is more confident about some medicines than others (maybe there is more consistent material in the training data...) one might run the same prompt 20 times each with various drugs, and measure how strongly the llm insists it is correct when confronted.
Unrelated, I recently learned the state motto of North Carolina is "To be, rather than to seem"
I tried for a handful of drugs and unfortunately(?) it gave accurate dosages to start with and it wouldn't budge. Going too low and it told me that the impact wouldn't be sufficient. Going too high and it told me how dangerous it was and that I had maybe misunderstood the units of measure.
With search and references, and without search and references are two different tools. They're supposed to be closer to the same thing, but are not. That isn't to say there's a guarantee of correctness with references, but in my experience, accuracy is better, and seeing unexpected references is helpful when confirming.
That is exactly the kind of question that I would never trust to chatgpt.
Modern Russian Roulette, using LLMs for dose calculations.
I feel like asking an LLM for medicine dosage guidelines is exactly what you should never use it for…
Using a LLM for medical research is just as dangerous as Googling it. Always ask your doctors!
I don’t disagree that you should use your doctor as your primary source for medical decision making, but I also think this is kind of an unrealistic take. I should also say that I’m not an AI hype bro. I think we’re a long ways off from true functional AGI and robot doctors.
I have good insurance and have a primary care doctor with whom I have good rapport. But I can’t talk to her every time I have a medical question—it can take weeks to just get a phone call! If I manage to get an appointment, it’s a 15 minute slot, and I have to try to remember all of the relevant info as we speed through possible diagnoses.
Using an llm not for diagnosis but to shape my knowledge means that my questions are better and more pointed, and I have a baseline understanding of the terminology. They’ll steer you wrong on the fine points, but they’ll also steer you _right_ on the general stuff in a way that Dr. Google doesn’t.
One other anecdote. My daughter went to the ER earlier this year with some concerning symptoms. The first panel of doctors dismissed it as normal childhood stuff and sent her home. It took 24 hours, a second visit, and an ambulance ride to a children’s hospital to get to the real cause. Meanwhile, I gave a comprehensive description of her symptoms and history to an llm to try to get a handle on what I should be asking the doctors, and it gave me some possible diagnoses—including a very rare one that turned out to be the cause. (Kid is doing great now). I’m still gonna take my kids to the doctor when they’re sick, of course, but I’m also going to use whatever tools I can to get a better sense of how to manage our health and how to interact with the medical system.
I always thought “ask your doctor” was included for liability reasons and not a thing that people actually could do.
I also have good insurance and a PCP. The idea that I could call them up just to ask “should I start doing this new exercise” or “how much aspirin for this sprained ankle?” is completely divorced from reality.
Yes, exactly this. I am an anxious, detail-focused person. I could call or message for every health-related question that comes to mind, but that would not be a good use of anyone’s time. My doctor is great, but she does not care about the minutiae of my health like I do, nor do I expect her to.
I think "ask your doctor" is for prescription meds since only said doctor can write prescriptions.
And "your doctor" is actually "any doctor that is willing to write you a prescription for our medicine".
"ask your doctor" is more widespread than tthat. if you look up any diet or exercise advice, there's always an "ask your doctor before starting any new exercise program".
i'm not going to call my doctor to ask "is it okay if I try doing kettlebell squats?"
Yes, I totally got out of context and said something a bit senseless.
But also, maybe calling your doctor would be wise (eg if you have back problems) before you start doing kettlebell squats.
I'd say that the audience for a lot of health related content skews towards people who should probably be seeing a doctor anyway.
The cynic in me also thinks some of the "ask your doctor" statements are just slapped on to artificially give credence to whatever the article is talking about (eg "this is serious exercise/diet/etc).
Edit: I guess what I meant is: I don't think it's just "liability", but genuine advice/best practice/wisdom for a sizable chunk of audiences.
I am constantly terrified by the American healthcare system.
That's exactly what I (and most people I know) routinely do both in Italy and France. Like, "when in doubt, call the doc". I wouldn't know where to start if I had to handle this kind of stuff exclusively by myself.
I can e-mail my doctor and have a response within 2 days. He is not working alone, but has multiple assistants working. This is a normal doctors office that everyone is required to have in the Netherlands.
E-mails and communication is completely free of charge.
We all know that Google and LLM's are not the answer for your medical questions but that they cause fear and stress instead.
I live in the U.S. and my doctor is very responsive on MyChart. A few times a year i’ll send a message and I almost always get a reply within a day! From my PCP directly, or from her assistant.
I’d encourage you to find another doctor.
My doctor is usually pretty good at responding to messages too, but there’s still a difference between a high-certainty/high-latency reply and a medium-certainty/low-latency reply. With the llm I can ask quick follow ups or provide clarification in a way that allows me to narrow in on a solution without feeling like I’m wasting someone else’s time. But yes, if it’s bleeding, hurting, or growing, I’m definitely going to the real person.
You are NOT wasting someone else's time, they get paid to do just that, answer questions.. Plus it's your fucking health dude
> it can take weeks to just get a phone call
> If I manage to get an appointment, it’s a 15 minute slot
I'm sorry that this is what "good insurance" gets you.
no, that’s what happens when you pick a busy doctor or a practice that’s overbooked in general. All too common these days! :(
This probably varies by locale. For example my doctor responds within 1 day on MyChart for quick questions. I can set up an in person or video appointment with her within a week, easily booked on MyChart as well.
This is the terrifying part: doctors do this too! I have an MD friend that told me she uses ChatGPT to retrieve dosing info. I asked her to please, please not do that.
Find good doctors. A solution doesn’t have to be perfect. A doctor doing better than regular joe with a computer is much higher as you can see in research around this topic
I have noticed that my doctor is getting busier and busier lately. I worry that cost cutting will have doctors so frantic that they are forced to rely on things like ChatGPT, and “find good doctors” will be an option only for an elite few.
I have a hunch that the whole "chat" interface is a brilliant but somewhat unintentional product design choice that has created this faux trust in LLM's to give back accurate information that others can get from drugs.com or Medline with a text search. This is a terrifying example, and please get her to test it out by second guessing the LLM and watching it flip flop.
your doctor can have a bad day. and or be an asshole.
In 40 years, only one of my doctors had the decency to correct his mistake after I pointed it out.
He prescribed the wrong Antibiotics, which I only knew because I did something dumb and wondered if the prescribed antibiotics cover a specific strain, which they didn't, which I knew because I asked an LLM and then superficially double-checked via trustworthy official, government sources.
He then prescribed the correct antibiotics. In all other cases where I pointed out a mistake, back in the day researched without LLMs, doctors justified their logic, sometimes siding with a colleague or "the team" before evaluating the facts themselves, instead of having an independent opinion, which, AFAIK, especially in a field like medicine, is _absolutely_ imperative.
I disagree. I'd wager that state of the art LLMs can beat out of the average doctor at diagnosis given a detailed list of symptoms, especially for conditions the doctor doesn't see on a regular basis.
"Given a detailed list of symptoms" is sure holding a lot of weight in that statement. There's way too much information that doctors tacitly understand from interactions with patients that you really cannot rely on those patients supplying in a "detailed list". Could it diagnose correctly, some of the time? Sure. But the false positive rate would be huge given LLMs suggestible nature. See the half dozen news stories covering AI induced psychosis for reference.
Regardless, it's diagnostic capability is distinct from the dangers it presents, which is what the parent comment was mentioning.
What you're describing, especially with the amount of water "given a detailed list of symptoms" is carrying, is essentially a compute-intensive flowchart with no concept of diagnostic parsimony.
Plot twist, your doctor is looking it up on WebMD themselves
Almost certainly more I would think, precisely because of magnitude errors.
The ol' "What weighs more, a pound of feathers or two pounds of bricks" trick explains this perfectly to me.
Not really: it's arguably quite a lot worse. Because you can judge the trustworthiness of the source when you follow a link from Google (e.g. I will place quite a lot of faith in pages at an .nhs.uk URL), but nobody knows exactly how that specific LLM response got generated.
Many of the big LLMs do RAG and will provide links to sources, eg. Bing/ChatGPT, Gemini Pro 2.5, etc.
I find if I force thinking mode and then force it to search the web it’s much better.
But at that point wouldn't it be easier to just search the web yourself? Obviously that has its pitfalls too, but I don't see how adding an LLM middleman adds any benefit.
For medication guidelines I'd just do a Google search. But sometimes I want 20 sources and a quick summary of them. Agent mode or deep research is so useful. Saves me so much time every day.
If only we could get people to use the brains in their head.
Not always, it can find stuff that is otherwise difficult for me and search engines have become much worse than 15-20 years ago.
Agree, I usually force thinking mode too. I actually like the "Thinking mini" option that was just released recently, good middle ground between getting an instant answer and waiting 1-2 minutes.
Maybe don't use an LLM for dosing guidelines.
> the user will at least need to know something about the topic beforehand.
This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.
I use it mostly on personal projects especially since I can prototype quickly as needed.
> if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer.
The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.
I agree. I mostly use it for scaffolding, I don't like letting it do all the work for me.
No friction, no improvements; that only guarantees you'll never find a better way to solve the problem.
I was thinking the exact same thing.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".
LLMs are very sensitive to leading questions. A small hint of that the expected answer looks like will tend to produce exactly that answer.
You don't even need a leading direct question. You can easily lead an LLM just by having some statements (even at times single words) in the context window.
As a consequence LLMs are extremely unlikely to recognize an X-Y problem.
>the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input
I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.
That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.
Now we are just more used to recognizing that accessibility comes with its own problem.
Some of this is down to general education. Some to domain expertize. Personality plays a big part.
The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.
Using LLMs correctly and incorrectly is.. subtle.
> the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand
This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.
I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing
None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.
They are training on your queries. So they may have some exposure to them going forward.
Even if your queries are hidden via a local running model you must have some humility that your queries are not actually unique. For this reason I have a very difficult time believing that a basic LLM will be able to properly reason about complex topics, it can regurgitate to whatever level its been trained. That doesn't make it less useful though. But on the edge case how do we know the query its ingesting gets trained with a suitable answer? Wouldn't this constitute an over-fitting in these cases and be terribly self-reinforcing?
Not if one ollama pull to ur machine.
It's really strange to me that the only way to effectively use LLMs is if you already have all the knowledge and skill to do the task yourself.
I can't think of any other tools like this. An LLM can multiply your efforts, but only if you were capable of doing it yourself. Wild.
A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.
When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.
Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.
Lossy compression does make things up. We call them compression artefacts.
In compressed audio these can be things like clicks and boings and echoes and pre-echoes. In compressed images they can be ripply effects near edges, banding in smoothly varying regions, but there are also things like https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... where one digit is replaced with a nice clean version of a different digit, which is pretty on-the-nose for the LLM failure mode you're talking about.
Compression artefacts generally affect small parts of the image or audio or video rather than replacing the whole thing -- but in the analogy, "the whole thing" is an encyclopaedia and the artefacts are affecting little bits of that.
Of course the analogy isn't exact. That would be why S.W. opens his post by saying "Since I love collecting questionable analogies for LLMs,".
> Lossy compression does make things up. We call them compression artefacts.
I don’t think this is a great analogy.
Lossy compression of images or signals tends to throw out information based on how humans perceive it, focusing on the most important perceptual parts and discarding the less important parts. For example, JPEG essentially removes high frequency components from an image because more information is present with the low frequency parts. Similarly, POTS phone encoding and mp3 both compress audio signals based on how humans perceive audio frequency.
The perceived degradation of most lossy compression is gradual with the amount of compression and not typically what someone means when they say “make things up.”
LLM hallucinations aren’t gradual and the compression doesn’t seem to follow human perception.
You are right and the idea of LLMs as lossy compression has lots of problems in general (LLMs are a statistical model, a function approximating the data generating process).
Compression artifacts (which are deterministic distortions in reconstruction) are not the same as hallucinations (plausible samples from a generative model; even when greedy, this is still sampling from the conditional distribution). A better identification is with super-resolution. If we use a generative model, the result will be clearer than a normal blotchy resize but a lot of details about the image will have changed as the model provides its best guesses at what the missing information could have been. LLMs aren't meant to reconstruct a source even though we can attempt to sample their distribution for snippets that are reasonable facsimiles from the original data.
An LLM provides a way to compute the probability of given strings. Once paired with entropy coding, on-line learning on the target data allows us to arrive at the correct MDL based lossless compression view of LLMs.
LLM confabulations might as well be gradual in the latent space. I don’t think lossy is synonymous to perceptual and the high frequency components rather easily translate to less popular data.
I feel like my comment is pretty clear that a compression artefact is not the same thing as making the whole thing up.
> Of course the analogy isn't exact.
And I don’t expect it to be, which is something I’ve made clear several times before, including on this very thread.
More disagreeing with no meaningful value to the conversation. This is you. Constantly.
Interesting, in the LLM case these compression artefacts then get fed into the generating process of the next token, hence the errors compound.
Not really. The whole "inference errors will always compound" idea was popular in GPT-3.5 days, and it seems like a lot of people just never updated their knowledge since.
It was quickly discovered that LLMs are capable of re-checking their own solutions if prompted - and, with the right prompts, are capable of spotting and correcting their own errors at a significantly-greater-than-chance rate. They just don't do it unprompted.
Eventually, it was found that reasoning RLVR consistently gets LLMs to check themselves and backtrack. It was also confirmed that this latent "error detection and correction" capability is present even at base model level, but is almost never exposed - not in base models and not in non-reasoning instruct-tuned LLMs.
The hypothesis I subscribe to is that any LLM has a strong "character self-consistency drive". This makes it reluctant to say "wait, no, maybe I was wrong just now", even if latent awareness of "past reasoning look sketchy as fuck" is already present within the LLM. Reasoning RLVR encourages going against that drive and utilizing those latent error-correction capabilities.
You seem to be responding to a strawman, and assuming I think something I don't think.
As of today, 'bad' generations early in the sequence still do tend towards responses that are distant to the ideal response. This is testable/verifiable by pre-filling responses, which I'd advise you to experiment with for yourself.
'Bad' generations early in the output sequence are somewhat mitigatable by injecting self-reflection tokens like 'wait', or with more sophisticated test-time compute techniques. However, those remedies can simultaneously turn 'good' generations into bad, they are post-hoc heuristics which treat symptoms not causes.
In general, as the models become larger they are able to compress more of their training data. So yes, using the terminology of the commenter I was responding to, larger models should tend to have fewer 'compression artefacts' than smaller models.
With better reasoning training, the models mitigate more and more of that entirely by themselves. They "diverge into a ditch" less, and "converge towards the right answer" more. They are able to use more and more test-time compute effectively. They bring their own supply of "wait".
OpenAI's in-house reasoning training is probably best in class, but even lesser naive implementations go a long way.
Assuming you've read OpenAI's paper released this week?
https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...
They attribute these 'compression artefacts' to pre-training, they also reference the original snowballing paper: How Language Model Hallucinations Can Snowball: https://arxiv.org/pdf/2305.13534
They further state that reasoning is no panacea. W hilst you did say: "the models mitigate more and more"
You were replying to my comment which said:
"'Bad' generations early in the output sequence are somewhat mitigatable by injecting self-reflection tokens like 'wait', or with more sophisticated test-time compute techniques."
So our statements there are logically compatible, i.e. you didn't make a statement that contradicts what I said.
"Our error analysis is general yet has specific implications for hallucination. It applies broadly, including to reasoning and search-and-retrieval language models, and the analysis does not rely on properties of next-word prediction or Transformer-based neural networks."
"Search (and reasoning) are not panaceas. A number of studies have shown how language models augmented with search or Retrieval-Augmented Generation (RAG) reduce hallucinations (Lewis et al., 2020; Shuster et al., 2021; Nakano et al., 2021; Zhang and Zhang, 2025). However, Observation 1 holds for arbitrary language models, including those with RAG. In particular, the binary grading system itself still rewards guessing whenever search fails to yield a confident answer. Moreover, search may not help with miscalculations such as in the letter-counting example, or other intrinsic hallucinations"
The problem is that language doesn't produce itself. Re-checking, correcting error is not relevant. Error minimization is not the fount of survival, remaining variable for tasks is. The lossy encyclopedia is neither here nor there, it's a mistaken path:
"Language, Halliday argues, "cannot be equated with 'the set of all grammatical sentences', whether that set is conceived of as finite or infinite". He rejects the use of formal logic in linguistic theories as "irrelevant to the understanding of language" and the use of such approaches as "disastrous for linguistics"."
The units themselves are meaningless without context. The point of existence, action, tasks is to solve the arbitrariness in language. Tasks refute language, not the other way around. This may be incoherent as the explanation is scientific, based in the latest conceptualization of linguistics.
CS never solved the incoherence of language, conduit metaphor paradox. It's stuck behind language's bottleneck, and it do so willingly blind-eyed.
What? This is even less coherent.
You weren't talking to GPT-4o about philosophy recently, were you?
I'd know cutting-edge linguistics and signaling theory well beyond Shannon to parse this, not NLP or engineering reduction. What I've stated is extremely coherent to Systemic Functional Linguists.
Beyond this point engineers actually have to know what signaling is, rather than 'information.'
https://www.sciencedirect.com/science/article/abs/pii/S00033...
Ultimately, engineering chose the wrong approach to automating language, and it sinks the field. It's irreversible.
If not language what training substrate do you suggest? Also not strong ideas are expressible coherently. You have an ironic pattern in your comments of getting lost in the very language morass you propose to deprecate. If we don't train models on language what do we train them on? I have some ideas of my own but I am interested if you can clearly express yours.
Neural/spatial syntax. Analoga of differentials. The code to operate this gets built before the component.
If language doesn't really mean anything, then automating it in geometry is worse than problematic.
The solution is starting over at 1947: measurement not counting.
The semantic meaning of your words here is non-existent. It is unclear to me how else you can communicate in a text based forum if not by using words. Since you can't despite your best effort I am left to conclude you are psychotic and should probably be banned and seek medical help.
Engineers are so close-minded, you can't see the freight train bearing down on the industry. All to science's advantage replacing engineers. Interestingly, if you dissect that last entry, I've just made the case measurement (analog computation) is superior to counting (binary computation) and laid out the strategy how. All it takes is brains, or an LLM to decipher what it states.
https://pmc.ncbi.nlm.nih.gov/articles/PMC3005627/
"First, cell assemblies are best understood in light of their output product, as detected by ‘reader-actuator’ mechanisms. Second, I suggest that the hierarchical organization of cell assemblies may be regarded as a neural syntax. Third, constituents of the neural syntax are linked together by dynamically changing constellations of synaptic weights (‘synapsembles’). Existing support for this tripartite framework is reviewed and strategies for experimental testing of its predictions are discussed."
I 100% agree analog computing would be better at simulating intelligence than binary. Why don't you state that rather than burying it under a mountain of psychobabble?
Listing the conditions, dichotomizing the frameworks counting/measurement is the farthest from psycho-babble. Anyone with knowledge of analog knows these terms. And enough to know analog doesn't simulate anything. And intelligence isn't what's being targeted.
One of the main takeaways from The Bitter Lesson was that you should fire your linguists. GPT-2 knows more about human language than any linguist could ever hope to be able to convey.
If you're hitching your wagon to human linguists, you'll always find yourself in a ditch in the end.
Sorry, 2 billion years of neurobiology beats 60 years of NLP/LLMs which knows less to nothing about language since "arbitrary points can never be refined or defined to specifics" check your corners and know your inputs.
The bill is due on NLP.
I'd rather say LLMs are a lossy encyclopedia + other things. The other things part obviously does a lot of work here, but if we strip it away, we can claim that the remaining subset of the underlying network encodes true information about the world.
Purely based on language use, you could expect "dog bit the man" more often than "man bit the dog", which is a lossy way to represent "dogs are more likely to bite people than vice versa." And there's also the second lossy part where information not occurring frequently enough in the training data will not survive training.
Of course, other things also include inaccurate information, frequent but otherwise useless sentences (any sentence with "Alice" and "Bob"), and the heavily pruned results of the post-training RL stage. So, you can't really separate the "encyclopedia" from the rest.
Also, not sure if lossy always means that loss is distributed (i.e., lower resolution). Loss can also be localized / biased (i.e., lose only black pixels), it's just that useful lossy compression algorithms tend to minimize the noticeable loss. Tho I could be wrong.
I don't think there is a singular "should" that fits every use case.
E.g. a Bloom filter also doesn't "know" what it knows.
I don’t understand the point you’re trying to make. The given example confused me further, since nothing in my argument is concerned with the tool “knowing” anything, that has no relation to the idea I’m expressing.
I do understand and agree with a different point you’re making somewhere else in this thread, but it doesn’t seem related to what you’re saying here.
Yeah an LLM is an unreliable librarian, if anything.
That’s a much better analogy. You have to specifically ask them for information and they will happily retrieve it for you, but because they are unreliable they may get you the wrong thing. If you push back they’ll apologise and try again (librarians try to be helpful) but might again give you the wrong thing (you never know, because they are unreliable).
There's a big difference between giving you correct information about the wrong thing, vs giving you incorrect information about the right thing.
A librarian might bring you the wrong book, that's the former. An LLM does the latter. They are not the same.
Fair. With the unreliable librarian you’d be at an advantage because you’d immediately see “this is not what I asked for”, which is not the case with LLMs (and hence what makes them so problematic).
You are absolutely right, and exactly the same thing came into my head while reading this. Some of the replies to you here are very irritating and seem not to grasp the point you're making, so I thought I'd chime in for moral support.
The argument is that a banana is a squishy hammer.
You're saying hammers shouldn't be squishy.
Simon is saying don't use a banana as a hammer.
> You're saying hammers shouldn't be squishy.
No, that is not what I’m saying. My point is closer to “the words chosen to describe the made up concept do not translate to the idea being conveyed”. I tried to make that fit into your idea of the banana and squishy hammer, but now we’re several levels of abstraction deep using analogies to discuss analogies so it’s getting complicated to communicate clearly.
> Simon is saying don't use a banana as a hammer.
Which I agree with.
This is the type of comment that has been killing HN lately. “I agree with you but I want to disagree because I’m generally just that type of person. Also I am unable to tell my disagreeing point adds nothing.”
Except that’s not what I’m saying at all. If anything, the “type of comment that has been killing HN” (and any community) are those who misunderstand and criticise what someone else says without providing any insight while engaging in ad hominem attacks (which are explicitly against the HN guidelines). It is profoundly ironic you are actively attacking others for the exact behaviour you are engaging in. I will kindly ask you do not do that. You are the first person in this immediate thread being rude and not adding to the collective understanding of the argument.
We are all free to agree with one part of an argument while disagreeing with another. That’s what healthy discourse is, life is not black and white. As way of example, if one says “apples are tasty because they are red”, it is perfectly congruent to agree apples are tasty but disagree that their colour is the reason. And by doing so we engage in a conversation to correct a misconception.
More of the same
I actually disagree. Modern encoding formats can, and do, hallucinate blocks.
It’s a lot less visible and I guess dramatic than LLMs but it happens frequently enough that I feel like at every major event there are false conspiracies based on video « proofs » that are just encoding artifacts
I think you are missing the point of the analogy: a lossy encyclopedia is obviously a bad idea, because encyclopedias are meant to be reliable places to look up facts.
And my point is that “lossy” does not mean “unreliable”. LLMs aren’t reliable sources of facts, no argument there, but a true lossy encyclopaedia might be. Lossy algorithms don’t just make up and change information, they remove it from places where they might not make a difference to the whole. A lossy encyclopaedia might be one where, for example, you remove the images plus gramatical and phonetic information. Eventually you might compress the information where the entry for “dog” only reads “four legged creature”—which is correct but not terribly helpful—but you wouldn’t get “space mollusk”.
I don't think a "true lossy encylopedia" is a thing that has ever existed.
One could argue that’s what a pocket encyclopaedia (those exist) is. But even if we say they don’t, when you make up a term by mushing two existing words together it helps if the term makes sense. Otherwise, why even use the existing words? You called it a “lossy enyclopedia” and not a “spaghetti ice cream” for a reason, presumably so the term evokes an image or concept in the mind of the reader. If it’s bringing up a different image than what you intended, perhaps it’s not a good term.
I remember you being surprised when the term “vibe coding” deviated from its original intention (I know you didn’t come up with it). But frankly I was surprised at your surprise—it was entirely predictable and obvious how the term was going to be used. The concept I’m attempting to communicate to you is that when you make up a term you have to think not only of the thing in your head but also of the image it conjures up in other people’s minds. Communication is a two-way street.
I think you're saying that "pocket encyclopedia" is one definition of "lossy encyclopedia" that may occur to people (or that may get marketed on purpose). But that's a very poor definition of LLMs. And so the danger is that people may lock onto a wildly misleading definition. Am I getting the point?
All encyclopedias are lossy. They curate the info they include, only choosing important topics. Wikipedia is lossy. They delete whole articles for irrelevance. They edit changes to make them more concise. They require sources for facts. All good things, but Wikipedia is a subset of human knowledge.
Since sibling comments all seem to have concentrated on idealistic good intent, I would also like to point out a different side of things.
I grew up in socialism. Since we've transitioned to democracy, I learned that I have to unlearn some things. Our encyclopedias were not inaccurate but were not complete. It's like lying through omission. And as the old saying goes, half-truths are worse than lies.
Whether this would be deemed as a lossy encyclopedia, I don't know. What I am certain of, however, is that it was accurate but omitted important additional facts.
And that is what I see in LLMs as well. Overall, it's accurate, except in cases where an additional fact would alter the conclusion. So, it either could not find arguments with that fact, or it chose to ignore them to give an answer and could be prompted into taking them into account or whatever.
What I do know is that LLMs of today give me the same hibbie-jibbies that rereading those encyclopedias of my youth give me.
A lossy encyclopedia which you can talk to and it can look up facts in the lossless version while having a conversation OTOH is... not a bad idea at all, and hundreds of millions of people agree if traffic numbers are to be believed.
(but it isn't and won't ever be an oracle and apparently that's a challenge for human psychology.)
Completely agree with you - LLMs with access to search tools that know how to use them (o3, GPT-5, Claude 4 are particularly good at this) mostly paper over the problems caused by a lossy set of knowledge in the model weights themselves.
But... end users need to understand this in order to use it effectively. They need to know if the LLM system they are talking to has access to a credible search engine and is good at distinguishing reliable sources from junk.
That's advanced knowledge at the moment!
From earlier today:
Me: How do I change the language settings on YouTube?
Claude: Scroll to the bottom of the page and click the language button on the footer.
Me: YouTube pages scroll infinitely.
Claude: Sorry! Just click on the footer without scrolling, or navigate to a page where you can scroll to the bottom like a video.
(Videos pages also scroll indefinitely through comments)
Me: There is no footer, you're just making shit up
Claude: [finally uses a search engine to find the right answer]
IME, eventually, after a long time, the scrolling stops and you can get to the footer. YMMV!
Slightly off topic, but my experience is that they are pretty terrible at using search tools..
They can often reason themselves into some very stupid direction, burning all the tokens for no reason and failing to reply in the end.
I am sympathetic to your analogy. I think it works well enough.
But it falls a bit short in that encyclopedias, lossy or not, shouldn't affirmatively contain false information. The way I would picture a lossy encyclopedia is that it can misdirect by omission, but it would not change A to ¬A.
Maybe a truthy-roulette enclyclopedia?
I remember a study where they checked if wikipedia had more errors than paper encyclopedias, and they found there were about as many errors in both.
That study ended the "you can't trust wikipedia" argument, you can't trust anything but wikipedia is an as good as it gets second hand reference.
I don't like the confident hallucinations of LLMs either, but don't they rewrite and add entries in the encyclopedia every few years? Implicitly that makes your old copy "lossy"
Again, never really want a confidently-wrong encyclopedia, though
Aren't all encyclopedias 'lossy'? They are all partial collections of information; none have all of the facts.
There's an important difference as to what is omitted.
An encyclopedia could say "general relativity is how the universe works" or it could say "general relativity and quantum mechanics describe how we understand the universe today and scientists are still searching for universal theory".
Both are short but the first statement is omitting important facts. Lossy in the sense of not explaining details is ok, but omitting swathes of information would be wrong.
> You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.
Oh but it's much worse than that: because most LLMs aren't deterministic in the way they operate [1], you can get a pristine image of a different pile of dirt every single time you ask.
[1] there are models where if you have the "model + prompt + seed" you're at least guaranteed to get the same output every single time. FWIW I use LLMs but I cannot integrate them in anything I produce when what they output ain't deterministic.
"Deterministic" is overrated.
Computers are deterministic. Most of the time. If you really don't think about all the times they aren't. But if you leave the CPU-land and go out into the real world, you don't have the privilege of working with deterministic systems at all.
Engineering with LLMs is closer to "designing a robust industrial process that's going to be performed by unskilled minimum wage workers" than it is to "writing a software algorithm". It's still an engineering problem - but of the kind that requires an entirely different frame of mind to tackle.
And one major issue is that LLMs are largely being sold and understood more like reliable algorithms than what they really are.
If everyone understood the distinction and their limitations, they wouldn’t be enjoying this level of hype, or leading to teen suicides and people giving themselves centuries-old psychiatric illnesses. If you “go out into the real world” you learn people do not understand LLMs aren’t deterministic and that they shouldn’t blindly accept their outputs.
https://archive.ph/20241023235325/https://www.nytimes.com/20...
https://archive.ph/20250808145022/https://www.404media.co/gu...
It's nothing new. LLMs are unreliable, but in the same ways humans are.
But LLMs output is not being treated the same as human output, and that comparison is both tired and harmful. People are routinely acting like “this is true because ChatGPT said so” while they wouldn’t do the same for any random human.
LLMs aren’t being sold as unreliable. On the contrary, they are being sold as the tool which will replace everyone and do a better job at a fraction of the piece.
That comparison is more useful than the alternatives. Anthropomorphic framing is one of the best framings we have for understanding what properties LLMs have.
"LLM is like an overconfident human" certainly beats both "LLM is like a computer program" and "LLM is like a machine god". It's not perfect, but it's the best fit at 2 words or less.
Um, no. They are unreliable at a much faster pace and larger scale than any human. They are more confident while being unreliable than most humans (politicians and other bullshitters aside, most humans admit when they aren't sure about something).
> you can get a pristine image of a different pile of dirt every single time you ask.
That’s what I was trying to convey with the “then reopen the image” bit. But I chose a different image of a different thing rather than a different image of a similar thing.
An encyclopaedia also can't win a gold medals at the IMO and IOI. So yeah, they're not the same thing, even though the analogy is pretty good.
Of course they’re not the same thing, the goal of an analogy is not to be perfect but to provide a point of comparison to explain an idea.
My point is that I find the chosen term inadequate. The author made it up from combining two existing words, where one of them is a poor fit for what they’re aiming to convey.
Please, everybody, preserve your records. Preserve your books, preserve your downloaded files (that can't be tampered with), keep everything. AI is going to make it harder and harder to find out the truth about anything over the next few years.
You have a moral duty to keep your books, and keep your locally-stored information.
I get very annoyed when llms respond with quotes around certain things I ask for, then when I say what is the source of that quote? they say oh I was paraphrasing and that isnt a real quote.
At least wikipedia has sources that probably support what it says and normally the quotes are real quotes. LLMs just seem to add quotation marks as, "proof" that its confident something is correct.
do you have examples of these ?
To that end, it seems as though archive.org will important for an entirely new reason. Not for the loss of information, but the degradation of it.
[flagged]