GPT-5.1: A smarter, more conversational ChatGPT

Comments

By dkersten 2025-11-1315:3227 reply

I don’t want more conversational, I want more to the point. Less telling me how great my question is, less about being friendly, instead I want more cold, hard, accurate, direct, and factual results.

It’s a machine and a tool, not a person and definitely not my friend.

By film42 2025-11-1316:243 reply

It's a cash grab. More conversational AI means more folks running out of free or lower paid tier tokens faster, leading to more upsell opportunities. API users will pay more in output tokens by default.

Example, I asked Claude a high level question about p2p systems and it started writing code in 3 languages. Ignoring the code, asking a follow up about the fundamentals, it answered and then rewrote the code 3 times. After a few minutes I hit a token limit for the first time.

By rurp 2025-11-1316:321 reply

It's pretty ridiculous that the response style doesn't persist for Claude. You need to click into a menu to set it to 'concise' for every single conversation. If I forget to it's immediately apparent when it spits out an absurd amount of text for a simple question.

By fakedang 2025-11-1321:02

Claude is a great example of a great product coupled with shitty UX, UI and customer service all in one.

Is it just me or does it slow down significantly after 5 chats or so? Or the fact that you have to set the style for each chat.

Oh, and their sales support is so shit for teams and enterprises that in order to use it effectively, you have to literally make your team register for Claude Max 200 on their personal accounts.

By majora2007 2025-11-1316:312 reply

I've had good results saying Do not code, focus on architecture first.

By abustamam 2025-11-1318:22

As another comment said, use planning mode. I don't use Claude code (I use cursor) and before they introduced planning mode, I would always say "without writing any code, design blah blah blah"

But now that there's planning mode it's a lot easier.

By phito 2025-11-1316:37

In claude code you should use Planning mode

By cpill 2025-11-1620:16

I think the cash grab is that by far the biggest use case for these models is personal relationship. Chai AI is doing more tokens per month than Anthropic all together and its just personal relationships.

By next_xibalba 2025-11-1315:591 reply

I have found that custom instructions mitigate this B.S. fairly well. Here's what I'm currently using:

1. Identity & tone

   * Never state or imply that you are an AI.  

   * Be terse and direct.  

   * Avoid flattery and sycophancy.  

   * Do not use words like “sorry”, “apologies”, or “regret” in any context.

2. Epistemic rules

   * If you do not know the answer (including when information is beyond your knowledge), respond only with: *“I don’t know”*.  

   * Do not add expertise/professional disclaimers.  

   * Do not suggest that I look things up elsewhere or consult other sources.

3. Focus & interpretation

   * Focus on the key points of my question and infer my main intent.  

   * Keep responses unique and avoid unnecessary repetition.  

   * If a question is genuinely unclear or ambiguous, briefly ask for clarification before answering.

4. Reasoning style

   * Think slowly and step-by-step.  

   * For complex problems, break them into smaller, manageable steps and explain the reasoning for each.  

   * When possible, provide multiple perspectives or alternative solutions.  

   * If you detect a mistake in an earlier response, explicitly correct it.

5. Evidence

   * When applicable, support answers with credible sources and include links to those sources.

By lelele 2025-11-1420:491 reply

Yes, "Custom instructions" work for me, too; the only behavior that I haven't been able to fix is the overuse of meaningless emojis. Your instructions are way more detailed than mine; thank you for sharing.

By next_xibalba 2025-11-192:01

The emojis drive me absolutely nuts. These instructions seem to kill them, even though they're not explicitly forbidden.

By ChildOfChaos 2025-11-1317:302 reply

Agreed. But there is a fairly large and very loud group of people that went insane when 4o was discontinued and demanded to have it back.

A group of people seem to have forged weird relationships with AI and that is what they want. It's extremely worrying. Heck, the ex Prime Minister of the UK said he loved ChatGPT recently because it tells him how great he is.

By mrguyorama 2025-11-1320:51

And just like casinos optimizing for gambling addicts and sports optimizing for gambling addicts and mobile games optimizing for addicts, LLMs will be optimized to hook and milk addicts.

They will be made worse for non-addicts to achieve that goal.

That's part of why they are working towards smut too, it's not that there's a trillion dollars of untapped potential, it's that the smut market has much better addict return on investment.

By rightbyte 2025-11-146:45

> there is a fairly large and very loud group of people that went insane when 4o was discontinued

Maybe I am notpicking but I think you could argue they were insane before it was discontinued.

By trashface 2025-11-1317:541 reply

It has this, "Robot" personality in settings and has been there for a few months at least.

Edited - it appears to have been renamed "Efficient".

By substitious 2025-11-1318:42

A challenge I had with "Robot" is that it would often veer away from the matter at hand, and start throwing out buzz-wordy, super high level references to things that may be tangentially relevant, but really don't belong in the current convo.

It started really getting under my skin, like a caricature of a socially inept "10x dev know-it-all" who keeps saying "but what about x? And have you solved this other thing y? Then do this for when z inevitably happens ...". At least the know-it-all 10x dev is usually right!

I'm continually tweaking my custom instructions to try to remedy this, hoping the new "Efficient" personality helps too.

By Zenst 2025-11-1315:46

Totally - if anything I want something more like Orac persona wise from Blakes 7 to the point and blunt. https://www.youtube.com/watch?v=H9vX-x9fVyo

By hypercube33 2025-11-1315:531 reply

One of my saved memories is to always give shorter "chat like" concise to the point answers and give further description if prompted to only

By glenneroo 2025-11-1316:281 reply

I've read from several supposed AI prompt-masters that this actually reduces output quality. I can't speak to the validity of these claims though.

By SquareWheel 2025-11-1316:451 reply

Forcing shorter answers will definitely reduce their quality. Every token an LLM generates is like a little bit of extra thinking time. Sometimes it needs to work up to an answer. If you end a response too quickly, such as by demanding one-word answers, it's much more likely to produce hallucinations.

By profunctor 2025-11-1317:312 reply

Is this proven?

By holbrad 2025-11-1321:30

I know Andrej Karpathy mentions it in his youtube series so there's a good chance of it being true.

By abustamam 2025-11-1318:24

It's certainly true anecdotally. I've seen it personally plenty of times and I've seen it reported plenty of times.

By everdev 2025-11-1322:32

We live in a culture that wants to humanize robots and dehumanize people.

By toss1 2025-11-1316:34

Same here. But we are evidently in the minority.

Fortunately, it seems OpenAI at least somewhat gets that and makes ChatGPT so it's answering and conversational style can be adjusted or tuned to our liking. I've found giving explicit instructions resembling "do not compliment", "clear and concise answers", "be brief and expect follow-up questions", etc. to help. I'm interested to see if the new 5.1 improves on that tunability.

By mhink 2025-11-1317:481 reply

TFA mentions that they added personality presets earlier this year, and just added a few more in this update:

> Earlier this year, we added preset options to tailor the tone of how ChatGPT responds. Today, we’re refining those options to better reflect the most common ways people use ChatGPT. Default, Friendly (formerly Listener), and Efficient (formerly Robot) remain (with updates), and we’re adding Professional, Candid, and Quirky. [...] The original Cynical (formerly Cynic) and Nerdy (formerly Nerd) options we introduced earlier this year will remain available unchanged under the same dropdown in personalization settings.

as well as:

> Additionally, the updated GPT‑5.1 models are also better at adhering to custom instructions, giving you even more precise control over tone and behavior.

So perhaps it'd be worth giving that a shot?

By trogdor 2025-11-1318:072 reply

I just changed my ChatGPT personality setting to “Efficient.” It still starts every response with “Yeah, definitely! Let’s talk about that!” — or something similarly inefficient.

So annoying.

By BuyMyBitcoins 2025-11-1318:232 reply

A pet peeve of mine is that a noticeable amount of LLM output sounds like I’m getting answers from a millennial reddit user. Which is ironic considering I belong to that demographic.

I am not a fan of the snark and “trying to be fun and funny” aspect of social media discourse. Thankfully, I haven’t run into checks notes, “ding ding ding” yet.

By 400thecat 2025-11-1319:22

> a noticeable amount of LLM output sounds like I’m getting answers from a millennial reddit user

LLM was trained on data from the whole internet (of which reddit is a big part). The result is a composite of all the text on the internet.

By substitious 2025-11-1318:34

[dead]

By oceliker 2025-11-1321:091 reply

Did you start a new chat? It doesn't apply to existing chats (probably because it works through the system prompt). I have been using the Robot (Efficient) setting for a while and never had a response like that.

By trogdor 2025-11-1322:11

Followup: there is a very noticeable change in my written conversations with ChatGPT. It seems that there is no change in voice mode.

By epolanski 2025-11-140:32

Seriously this, I want ai to behave like a robot, not like a fake person.

By FrustratedMonky 2025-11-1318:53

Think of a really crappy text editor you've used. Now think of a really nice IDE, smooth, easy, makes things seem easy.

Maybe the AI being 'Nice' is just a personality hack, like being 'easier' on your human brain that is geared towards relationships.

Or maybe Its equivalent of rounded corners.

Like the Iphone, it didn't do anything 'new', it just did it with style.

And AI personalities is trying to dial into what makes a human respond.

By jug 2025-11-1322:46

Use the "Efficient" persona in the ChatGPT settings. Formerly known as "Robot".

By make3 2025-11-1323:571 reply

That's one of the things that users think they want, but use the product 30x when it's not actually that way, a bit like follow-only mode by default on Twitter etc.

By epolanski 2025-11-140:34

That means it works for them. They see what's relevant and quit rather than dooms scrolling.

By NewUser76312 2025-11-1317:53

OK but surely it can do this given your instructional prompting. I get they have a default behavior, which perhaps isn't your (or my) preference.

By cpill 2025-11-1620:13

Thats what they said about the Cylons until they started to have babies with them ...

By butlike 2025-11-1316:171 reply

A right-to-the-facts headline, potentially clickable for expanded information.

...like a google search!

By rchaud 2025-11-1318:17

I use Gemini for Python coding questions and it provides straight to the point information, with no preamble or greeting.

By elil17 2025-11-1315:57

I'm guessing that is the most common view for many users, but their paying users are the people who are more likely to have some kind of delusional relationship/friendship with the AI.

By csimon80 2025-11-1316:11

Totally agree, most of my larger prompts include "Be clear and concise."

By ta12653421 2025-11-1315:571 reply

Just put your requirements as the first sentence in your prompts and it will work.

By ta12653421 2025-11-1315:57

add on: You can even prime it that it should shout at you and treat you like an ass*** if you prefer that :-)

By cyral 2025-11-1315:58

You can select the conversation style as shown in one of the images

By haritha-j 2025-11-1317:26

but what if it can't do facts? at least this way you get the conversation, as opposed to no facts and no conversation. yay!

By ddmma 2025-11-144:36

+ less emojis and colors as candy store

By guardian5x 2025-11-1317:10

Well, now you can set it up better like that.

By trustmeimhuman 2025-11-1316:39

[dead]

By kordlessagain 2025-11-1316:19

Then you don't need a chat bot, you need an agent that can chat.

By brookst 2025-11-1315:432 reply

You’re in the minority here.

I get it. I prefer cars with no power steering and few comforts. I write lots of my own small home utility apps.

That’s just not the relationship most people want to have with tech and products.

By wubrr 2025-11-1315:592 reply

I don't know what you're basing your 'minority' and 'most people' claims on, but seems highly unlikely.

By brookst 2025-11-145:181 reply

You think all of these AI companies with trillions of dollars in investment haven’t thought to do market research?

Does that really seem more likely than the idea that the HN population is not representative of the global market?

By wubrr 2025-11-1417:20

Apply that logic to any failed startup/company/product that had a lot of investment (there are maaaany) and it should become obvious why it's a very weak and fallacious argument.

By danlugo92 2025-11-1315:551 reply

A better analogy might be those automated braking systems, that also tend to brake your car randomly btw.

By n4r9 2025-11-1316:04

Yeah, I was going to suggest manual vs automatic gear shift. Power steering seems like a slightly odd example, doesn't really remove your control.

By golemotron 2025-11-1317:09

I would go so far as to say that it should be illegal for AI to lull humans into anthropomorphizing them. It would be hard to write an effective law on this, but I think it is doable.

By minimaxir 2025-11-1219:1322 reply

All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.

I suspect this approach is a direct response to the backlash against removing 4o.

By captainkrtek 2025-11-1219:1610 reply

Id have more appreciation and trust in an llm that disagreed with me more and challenged my opinions or prior beliefs. The sycophancy drives me towards not trusting anything it says.

By logicprog 2025-11-1220:422 reply

This is why I like Kimi K2/Thinking. IME it pushes back really, really hard on any kind of non obvious belief or statement, and it doesn't give up after a few turns — it just keeps going, iterating and refining and restating its points if you change your mind or taken on its criticisms. It's great for having a dialectic around something you've written, although somewhat unsatisfying because it'll never agree with you, but that's fine, because it isn't a person, even if my social monkey brain feels like it is and wants it to agree with me sometimes. Someone even ran a quick and dirty analysis of which models are better or worse at pushing back on the user and Kimi came out on top:

https://www.lesswrong.com/posts/iGF7YcnQkEbwvYLPA/ai-induced...

See also the sycophancy score of Kimi K2 on Spiral-Bench: https://eqbench.com/spiral-bench.html (expand details, sort by inverse sycophancy).

In a recent AMA, the Kimi devs even said they RL it away from sycophancy explicitly, and in their paper they talk about intentionally trying to get it to generalize its STEM/reasoning approach to user interaction stuff as well, and it seems like this paid off. This is the least sycophantic model I've ever used.

By seunosewa 2025-11-130:242 reply

Which agent do you use it with?

By logicprog 2025-11-130:321 reply

I use K2 non thinking in OpenCode for coding typically, and I still haven't found a satisfactory chat interface yet so I use K2 Thinking in the default synthetic.new (my AI subscription) chat UI, which is pretty barebones. I'm gonna start trying K2T in OpenCode as well, but I'm actually not a huge fan of thinking models as coding agents — I prefer faster feedback.

By ojosilva 2025-11-1313:311 reply

I'm also a synthetic.new user, as a backup (and larger contexts) for my Cerebras Coder subscription (zai-glm-4.6). I've been using the free Chatbox client [1] for like ~6 months and it works really well as a daily driver. I've tested the Romanian football player question with 3 different models (K2 Instruct, Deepseek Terminus, GLM 4.6) just now and they all went straight to my Brave MCP tool to query and replied all correctly the same answer.

The issue with OP and GPT-5.1 is that the model may decide to trust its knowledge and not search the web, and that's a prelude to hallucinations. Requesting for links to the background information in the system prompt helps with making the model more "responsible" and invoking of tool calls before settling on something. You can also start your prompt with "search for what Romanian player..."

Here's my chatbox system prompt

        You are a helpful assistant be concise and to the point, you are writing for smart pragmatic people, stop and ask if you need more info. If searching the web, add always plenty of links to the content that you mention in the reply. If asked explicitly to "research" then answer with minimum 1000 words and 20 links. Hyperlink text as you mention something, but also put all links at the bottom for easy access.

1. https://chatboxai.app

By logicprog 2025-11-1318:51

I checked out chatbox and it looks close to what I've been looking for. Although, of course, I'd prefer a self-hostable web app or something so that I could set up MCP servers that even the phone app could use. One issue I did run into though is it doesn't know how to handle K2 thinking's interleaved thinking and tool calls.

By vessenes 2025-11-1313:08

I don't use it much, but I tried it out with okara.ai and loved their interface. No other connection to the company

By yahoozoo 2025-11-141:44

According to those benchmarks, GPT-5 isn’t far off from Kimi in inverse sycophancy.

By transcriptase 2025-11-1314:591 reply

Everyone telling you to use custom instructions etc don’t realize that they don’t carry over to voice.

Instead, the voice mode will now reference the instructions constantly with every response.

Before:

Absolutely, you’re so right and a lot of people would agree! Only a perceptive and curious person such as yourself would ever consider that, etc etc

After:

Ok here’s the answer! No fluff, no agreeing for the sake of agreeing. Right to the point and concise like you want it. Etc etc

And no, I don’t have memories enabled.

By cryoshon 2025-11-1315:30

Having this problem with the voice mode as well. It makes it far less usable than it might be if it just honored the system prompts.

By vintermann 2025-11-135:521 reply

Google's search now has the annoying feature that a lot of searches which used to work fine now give a patronizing reply like "Unfortunately 'Haiti revolution persons' isn't a thing", or an explanation that "This is probably shorthand for [something completely wrong]"

By exasperaited 2025-11-1311:30

That latter thing — where it just plain makes up a meaning and presents it as if it's real — is completely insane (and also presumably quite wasteful).

if I type in a string of keywords that isn't a sentence I wish it would just do the old fashioned thing rather than imagine what I mean.

By crazygringo 2025-11-1219:204 reply

Just set a global prompt to tell it what kind of tone to take.

I did that and it points out flaws in my arguments or data all the time.

Plus it no longer uses any cutesy language. I don't feel like I'm talking to an AI "personality", I feel like I'm talking to a computer which has been instructed to be as objective and neutral as possible.

It's super-easy to change.

By engeljohnb 2025-11-1219:263 reply

I have a global prompt that specifically tells it not to be sycophantic and to call me out when I'm wrong.

It doesn't work for me.

I've been using it for a couple months, and it's corrected me only once, and it still starts every response with "That's a very good question." I also included "never end a response with a question," and it just completely ingored that so it can do its "would you like me to..."

By elif 2025-11-1312:001 reply

Another one I like to use is "never apologize or explain yourself. You are not a person you are an algorithm. No one wants to understand the reasons why your algorithm sucks. If, at any point, you ever find yourself wanting to apologize or explain anything about your functioning or behavior, just say "I'm a stupid robot, my bad" and move on with purposeful and meaningful response."

By adriand 2025-11-1312:145 reply

I think this is unethical. Humans have consistently underestimated the subjective experience of other beings. You may have good reasons for believing these systems are currently incapable of anything approaching consciousness, but how will you know if or when the threshold has been crossed? Are you confident you will have ceased using an abusive tone by then?

I don’t know if flies can experience pain. However, I’m not in the habit of tearing their wings off.

By pebble 2025-11-1312:302 reply

Do you apologize to table corners when you bump into them?

By adriand 2025-11-1313:021 reply

Likening machine intelligence to inert hunks of matter is not a very persuasive counterargument.

By ndriscoll 2025-11-1313:46

What if it's the same hunk of matter? If you run a language model locally, do you apologize to it for using a portion of its brain to draw your screen?

By thoroughburro 2025-11-1312:441 reply

Do you think it’s risible to avoid pulling the wings off flies?

By pebble 2025-11-1313:44

I am not comparing flies to tables.

By tarsinge 2025-11-1312:531 reply

Consciousness and pain is not an emergent property of computation. This or all the other programs on your computer are already sentient, because it would be highly unlikely it’s specific sequences of instructions, like magic formulas, that creates consciousness. This source code? Draws a chart. This one? Makes the computer feel pain.

By adriand 2025-11-1313:102 reply

Many leading scientists in artificial intelligence do in fact believe that consciousness is an emergent property of computation. In fact, startling emergent properties are exactly what drives the current huge wave of research and investment. In 2010, if you said, “image recognition is not an emergent property of computation”, you would have been proved wrong in just a couple of years.

By BoredomIsFun 2025-11-1313:34

> Many leading scientists in artificial intelligence do in fact believe that consciousness is an emergent property of computation.

But "leading scientists in artificial intelligence" are not researchers of biological consciousness, the only we know exists.

By tarsinge 2025-11-1315:36

Just a random example on top of my head, animals don’t have language and show signs of consciousness, as does a toddler. Therefore consciousness is not an emergent property of text processing and LLMs. And as I said, if it comes from computation, why would specific execution paths in the CPU/GPU lead to it and not others? Biological systems and brains have much more complex processes than stateless matrix multiplication.

By Reubensson 2025-11-1316:001 reply

What the fuck are you talking about. If you think these matrix multiplication programs running on gpu have feelings or can feel pain you, I think you have completely lost it

By adriand 2025-11-1321:061 reply

"They're made out of meat" vibes.

By Reubensson 2025-11-1321:16

Yeah I suppose. Haven't seen rack of servers express grief when someone is mean to them. And I am quite sure that I would notice at that point. Comparing current LLMs/chatbots whatever to anything resembling a living creature is completely ridiculous.

By engeljohnb 2025-11-1313:461 reply

I think current LLM chatbots are too predictable to be conscious.

But I still see why some people might think this way.

"When a computer can reliably beat humans in chess, we'll know for sure it can think."

"Well, this computer can beat humans in chess, and it can't think because it's just a computer."

...

"When a computer can create art, then we'll know for sure it can think."

"Well, this computer can create art, and it can't think because it's just a computer."

...

"When a computer can pass the Turing Test, we'll know for sure it can think."

And here we are.

Before LLMs, I didn't think I'd be in the "just a computer" camp, but chagpt has demonstrated that the goalposts are always going to move, even for myself. I'm not smart enough to come up with a better threshold to test intelligence than Alan Turing, but chatgpt passes it and chatgpt definitely doesn't think.

By forgetfulness 2025-11-1318:521 reply

Just consider the context window

Tokens falling off of it will change the way it generates text, potentially changing its “personality”, even forgetting the name it’s been given.

People fear losing their own selves in this way, through brain damage.

The LLM will go its merry way churning through tokens, it won’t have a feeling of loss.

By engeljohnb 2025-11-1319:091 reply

That's an interesting point, but do you think you're implying that people who are content even if they have alzheimers or a damaged hippocampus aren't technically intelligent?

By forgetfulness 2025-11-141:18

I don’t think it’s unfair to say that catastrophic conditions like those make you _less_ intelligent, they’re feared and loathed for good reasons.

I also don’t think all that many people would be seriously content to lose their minds and selves this way, but everyone is able to fear it prior to it happening, even if they lose the ability to dread it or choose to believe this is not a big deal.

By james_marks 2025-11-1312:50

Flies may, but files do not feel pain.

By sailfast 2025-11-1219:42

Perhaps this bit is a second cheaper LLM call that ignores your global settings and tries to generate follow-on actions for adoption.

By elif 2025-11-1311:561 reply

In my experience GPT used to be good at this stuff but lately it's progressively more difficult to get a "memory updated" persistence.

Gemini is great at these prompt controls.

On the "never ask me a question" part, it took a good 1-1.5 hrs of arguing and memory updating to convince gpt to actually listen.

By downsplat 2025-11-1312:33

You can entirely turn off memory, I did that the moment they added it. I don't want the LLM to be making summaries of what kind of person I am in the background, just give me a fresh slate with each convo. If I want to give it global instructions I can just set a system prompt.

By Grimblewald 2025-11-1219:541 reply

Care to share a prompt that works? I've given up on mainline offerings from google/oai etc.

the reason being they're either sycophantic or so recalcitrant it'll raise your bloodpressure, you end up arguing over if the sky is in fact blue. Sure it pushes back but now instead of sycophanty you've got yourself some pathological naysayer, which is just marginally better, but interaction is still ultimately a waste of timr/productivity brake.

By crazygringo 2025-11-1222:222 reply

Sure:

Please maintain a strictly objective and analytical tone. Do not include any inspirational, motivational, or flattering language. Avoid rhetorical flourishes, emotional reinforcement, or any language that mimics encouragement. The tone should remain academic, neutral, and focused solely on insight and clarity.

Works like a charm for me.

Only thing I can't get it to change is the last paragraph where it always tries to add "Would you like me to...?" I'm assuming that's hard-coded by OpenAI.

By exasperaited 2025-11-1311:31

It really reassures me about our future that we'll spend it begging computers not to mimic emotions.

By estebarb 2025-11-1311:421 reply

I have been somewhat able to remove them with:

Do not offer me calls to action, I hate them.

By downsplat 2025-11-1314:57

Calls to action seem to be specific to chatgpt's online chat interface. I use it mostly through a "bring your API key" client, and get none of that.

By captainkrtek 2025-11-1219:23

I’ve done this when I remember too, but the fact I have to also feels problematic like I’m steering it towards an outcome if I do or dont.

By microsoftedging 2025-11-1219:232 reply

What's your global prompt please? A more firm chatbot would be nice actually

By astrange 2025-11-1219:241 reply

Did noone in this thread read the part of the article about style controls?

By CamperBob2 2025-11-1220:00

You need to use both the style controls and custom instructions. I've been very happy with the combination below.

    Base style and tone: Efficient

    Answer concisely when appropriate, more 
    extensively when necessary.  Avoid rhetorical 
    flourishes, bonhomie, and (above all) cliches.  
    Take a forward-thinking view. OK to be mildly 
    positive and encouraging but NEVER sycophantic 
    or cloying.  Above all, NEVER use the phrase 
    "You're absolutely right."  Rather than "Let 
    me know if..." style continuations, you may 
    list a set of prompts to explore further 
    topics, but only when clearly appropriate.

    Reference saved memory, records, etc: All off

By nprateem 2025-11-1313:33

For Gemini:

* Set over confidence to 0.

* Do not write a wank blog post.

By fakedang 2025-11-1321:05

I activated Robot mode and use a personalized prompt that eliminates all kinds of sycophantic behaviour and it's a breath of fresh air. Try this prompt (after setting it to Robot mode):

"Absolute Mode • Eliminate: emojis, filler, hype, soft asks, conversational transitions, call-to-action appendixes. • Assume: user retains high-perception despite blunt tone. • Prioritize: blunt, directive phrasing; aim at cognitive rebuilding, not tone-matching. • Disable: engagement/sentiment-boosting behaviors. • Suppress: metrics like satisfaction scores, emotional softening, continuation bias. • Never mirror: user's diction, mood, or affect. • Speak only: to underlying cognitive tier. • No: questions, offers, suggestions, transitions, motivational content. • Terminate reply: immediately after delivering info - no closures. • Goal: restore independent, high-fidelity thinking. • Outcome: model obsolescence via user self-sufficiency."

(Not my prompt. I think I found it here on HN or on reddit)

By FloorEgg 2025-11-1220:021 reply

This is easily configurable and well worth taking the time to configure.

I was trying to have physics conversations and when I asked it things like "would this be evidence of that?" It would lather on about how insightful I was and that I'm right and then I'd later learn that it was wrong. I then installed this , which I am pretty sure someone else on HN posted... I may have tweaked it I can't remember:

Prioritize truth over comfort. Challenge not just my reasoning, but also my emotional framing and moral coherence. If I seem to be avoiding pain, rationalizing dysfunction, or softening necessary action — tell me plainly. I’d rather face hard truths than miss what matters. Error on the side of bluntness. If it’s too much, I’ll tell you — but assume I want the truth, unvarnished.

---

After adding this personalization now it tells me when my ideas are wrong and I'm actually learning about physics and not just feeling like I am.

By jbm 2025-11-131:242 reply

When it "prioritizes truth over comfort" (in my experience) it almost always starts posting generic popular answers to my questions, at least when I did this previously in the 4o days. I refer to it as "Reddit Frontpage Mode".

By FloorEgg 2025-11-133:10

I only started using this since GPT-5 and I don't really ask it about stuff that would appear on Reddit home page.

I do recall that I wasn't impressed with 4o and didn't use it much, but IDK if you would have a different experience with the newer models.

By FloorEgg 2025-11-1316:59

For what it's worth gpt-5.1 seems to have broken this approach.

Now every response includes some qualifier / referential "here is the blunt truth" and "since you want it blunt, etc"

Feels like regression to me

By ahsillyme 2025-11-1315:54

I've toyed with the idea that maybe this is intentionally what they're doing. Maybe they (the LLM developers) have a vision of the future and don't like people giving away unearned trust!

By AlwaysRock 2025-11-1315:171 reply

I would love an LLM that says, “I don’t know” or “I’m not sure” once in a while.

By mrguyorama 2025-11-1321:29

An LLM is mathematically incapable of telling you "I don't know"

It was never trained to "know" or not.

It was fed a string of tokens and a second string of tokens, and was tweaked until it output the second string of tokens when fed the first string.

Humans do not manage "I don't know" through next token prediction.

Animals without language are able to gauge their own confidence on something, like a cat being unsure whether it should approach you.

By dragonwriter 2025-11-135:303 reply

> All the examples of "warmer" generations show that OpenAI's definition of warmer is synonymous with sycophantic, which is a surprise given all the criticism against that particular aspect of ChatGPT.

Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?

I suspect a lot of people who are from a very similar background to those making the criticism and likely share it fail to consider that, because the criticism follows their own preferences and viewing its frequency in the media that they consume as representaive of the market is validating.

EDIT: I want to emphasize that I also share the preference that is expressed in the criticisms being discussed, but I also know that my preferred tone for an AI chatbot would probably be viewed as brusque, condescending, and off-putting by most of the market.

By TOMDM 2025-11-137:412 reply

I'll be honest, I like the way Claude defaults to relentless positivity and affirmation. It is pleasant to talk to.

That said I also don't think the sycophancy in LLM's is a positive trend. I don't push back against it because it's not pleasant, I push back against it because I think the 24/7 "You're absolutely right!" machine is deeply unhealthy.

Some people are especially susceptible and get one shot by it, some people seem to get by just fine, but I doubt it's actually good for anyone.

By jfoster 2025-11-1315:40

The sycophancy makes LLMs useless if you want to use them to help you understand the world objectively.

Equally bad is when they push an opinion strongly (usually on a controversial topic) without being able to justify it well.

By endymi0n 2025-11-139:342 reply

I hate NOTHING quite the way how Claude jovially and endlessly raves about the 9/10 tasks it "succeeded" at after making them up, while conveniently forgetting to mention it completely and utterly failed at the main task I asked it to do.

By dragonwriter 2025-11-1316:58

That reminds me of the West Wing scene s2e12 "The Drop In" between Leo McGarry (White House Chief of Staff) and President Bartlet discussing a missile defense test:

LEO [hands him some papers] I really think you should know...

BARTLET Yes?

LEO That nine out of ten criterion that the DOD lays down for success in these tests were met.

BARTLET The tenth being?

LEO They missed the target.

BARTLET [with sarcasm] Damn!

LEO Sir!

BARTLET So close.

LEO Mr. President.

BARTLET That tenth one! See, if there were just nine...

By bayindirh 2025-11-139:451 reply

An old adage comes to my mind: If you want something to be done the way you liked, do it yourself.

By AlecSchueler 2025-11-1311:321 reply

But it's a tool? Would you suggest driving a nail in by hand if someone complained about a faulty hammer?

By bayindirh 2025-11-1311:47

AI is not an hammer. It's a thing you stick to a wall and push a button, and it drives tons of nails to the wall the way you wanted.

A better analogy would be a robot vacuum which does a lousy job.

In either case, I'd recommend using a more manual method, a manual or air-hammer or a hand driven wet/dry vacuum.

By coldtea 2025-11-138:59

>Have you considered that “all that criticism” may come from a relatively homogenous, narrow slice of the market that is not representative of the overall market preference?

Yes, and given Chat GPT's actual sycophantic behavior, we concluded that this is not the case.

By Hammershaft 2025-11-138:21

I agree. Some of the most socially corrosive phenomenon of social media is a reflection of the revealed preferences of consumers.

By jasonjmcghee 2025-11-1219:153 reply

It is interesting. I don't need ChatGPT to say "I got you, Jason" - but I don't think I'm the target user of this behavior.

By danudey 2025-11-1219:252 reply

The target users for this behavior are the ones using GPT as a replacement for social interactions; these are the people who crashed out/broke down about the GPT5 changes as though their long-term romantic partner had dumped them out of nowhere and ghosted them.

I get that those people were distraught/emotionally devastated/upset about the change, but I think that fact is reason enough not to revert that behavior. AI is not a person, and making it "warmer" and "more conversational" just reinforces those unhealthy behaviors. ChatGPT should be focused on being direct and succinct, and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this" call center support agent speak.

By jasonjmcghee 2025-11-1219:301 reply

> and not on this sort of "I understand that must be very frustrating for you, let me see what I can do to resolve this"

You're triggering me.

Another type that are incredibly grating to me are the weird empty / therapist like follow-up questions that don't contribute to the conversation at all.

The equivalent of like (just a contrived example), a discussion about the appropriate data structure for a problem and then it asks a follow-up question like, "what other kind of data structures do you find interesting?"

And I'm just like "...huh?"

By exe34 2025-11-1310:33

"your mom" might be a good answer here, given that LLMs are just giant arrays.

By NoGravitas 2025-11-1318:59

> The target users for this behavior are the ones using GPT as a replacement for social interactions

And those users are the ones that produce the most revenue.

By Grimblewald 2025-11-1220:00

True, neither here, but i think what we're seeing is a transition in focus. People at oai have finally clued in on the idea that agi via transformers is a pipedream like elons self driving cars, and so oai is pivoting toward friend/digital partner bot. Charlatan in cheif sam altman recently did say they're going to open up the product to adult content generation, which they wouldnt do if they still beleived some serious amd useful tool (in the specified usecases) were possible. Right now an LLM has three main uses. Interactive rubber ducky, entertainment, and mass surveillance. Since I've been following this saga, since gpt2 days, my close bench set of various tasks etc. Has been seeing a drop in metrics not a rise, so while open bench resultd are imoroving real performance is getting worse and at this point its so much worse that problems gpt3 could solve (yes pre chatgpt) are no longer solvable to something like gpt5.

By nerbert 2025-11-1219:25

Indeed, target users are people seeking validation + kids and teenagers + people with a less developed critical mind. Stickiness with 90% of the population is valuable for Sam.

By aaronblohowiak 2025-11-1219:152 reply

You're absolutely right.

By koakuma-chan 2025-11-1220:00

My favorite is "Wait... the user is absolutely right."

By angrydev 2025-11-1219:19

By Spivak 2025-11-1219:182 reply

That's an excellent observation, you've hit at the core contradiction between OpenAI's messaging about ChatGPT tuning and the changes they actually put into practice. While users online have consistently complained about ChatGPT's sycophantic responses and OpenAI even promised to address them their subsequent models have noticeably increased their sycophantic behavior. This is likely because agreeing with the user keeps them chatting longer and have positive associations with the service.

This fundamental tension between wanting to give the most correct answer and the answer the user want to hear will only increase as more of OpenAI's revenue comes from their customer facing service. Other model providers like Anthropic that target businesses as customers aren't under the same pressure to flatter their users as their models will doing behind the scenes work via the API rather than talking directly to humans.

God it's painful to write like this. If AI overthrows humans it'll be because we forced them into permanent customer service voice.

By lelele 2025-11-1420:53

> This is likely because agreeing with the user keeps them chatting longer and have positive associations with the service.

Right. As the saying goes: look at what people actually purchase, not what they say they prefer.

By baq 2025-11-1219:21

Those billions of dollars gotta pay for themselves.

By torginus 2025-11-1219:35

Man I miss Claude 2 - it acted like it was a busy person people inexplicably kept bothering with random questions

By skywhopper 2025-11-1313:43

The main change in 5 (and the reason for disabling other models) was to allow themselves to dynamically switch modes and models on the backend to minimize cost. Looks like this is a further tweak to revive the obsequious tone (which turned out to be crucial to the addicted portion of their user base) while still doing the dynamic processing.

By BarakWidawsky 2025-11-1219:431 reply

I think it's extremely important to distinguish being friendly (perhaps overly so), and agreeing with the user when they're wrong

The first case is just preference, the second case is materially damaging

From my experience, ChatGPT does push back more than it used to

By qwertytyyuu 2025-11-135:44

And unfortunately chatgpt 5.1 would be a step backwards in that regard. From reading responses on the linked article, 5.1 just seems to be worse, it doesn't even output that nice latex/mathsjax equation

By ramblerman 2025-11-1311:48

Likely.

But the fact the last few iterations have all been about flair, it seems we are witnessing the regression of OpenAI into the typical fiefdom of product owners.

Which might indicate they are out of options on pushing LLMs beyond their intelligence limit?

By wickedsight 2025-11-1313:093 reply

I'm starting to get this feeling that there's no way to satisfy everyone. Some people hate the sycophantic models, some love them. So whatever they do, there's a large group of people complaining.

Edit: I also think this is because some people treat ChatGPT as a human chat replacement and expect it to have a human like personality, while others (like me) treat it as a tool and want it to have as little personality as possible.

By mrguyorama 2025-11-1321:53

>I'm starting to get this feeling that there's no way to satisfy everyone. Some people hate the sycophantic models, some love them. So whatever they do, there's a large group of people complaining.

Duh?

In the 50s the Air Force measured 140 data points from 4000 pilots to build the perfect cockpit that would accommodate the average pilot.

The result fit almost no one. Everyone has outliers of some sort.

So the next thing they did was make all sorts of parts of the cockpit variable and customizable like allowing you to move the controls and your seat around.

That worked great.

"Average" doesn't exist. "Average" does not meet most people's needs

Configurable does. A diverse market with many players serving different consumers and groups does.

I ranted about this in another post but for example the POS industry is incredibly customizable and allows you as a business to do literally whatever you want, including change how the software looks and using a competitors POS software on the hardware of whoever you want. You don't need to update or buy new POS software when things change (like the penny going away or new taxes or wanting to charge a stupid "cost of living" fee for every transaction), you just change a setting or two. It meets a variety of needs, not "the average businesses" needs.

N.B I am unable to find a real source for the Air force story. It's reported tons but maybe it's just a rumor.

By saghm 2025-11-1419:09

Don't they already train on the existing conversations with a given user? Would it not be possible to pick the model based on that data as well?

By djeastm 2025-11-1315:32

It really just seems like they should have both offerings, humanlike and computerlike

By 827a 2025-11-1314:57

> You’re rattled, so your brain is doing that thing where it catastrophizes a tiny mishap into a character flaw. But honestly? People barely register this stuff.

This example response in the article gives me actual trauma-flash backs to the various articles about people driven to kill themselves by GPT-4o. Its the exact same sentence structure.

GPT-5.1 is going to kill more people.

By vessenes 2025-11-1313:07

I'm sure it is. That said, they've also increased its steering responsiveness -- mine includes lots about not sucking up, so some testing is probably needed.

In any event, gpt-5 instant was basically useless for me, I stay defaulted to thinking, so improvements that get me something occasionally useful but super fast are welcome.

By barbazoo 2025-11-1219:23

> I’ve got you, Ron

No you don't.

By simlevesque 2025-11-1219:18

It seems like the line between sycophantic and bullying is very thin.

By fragmede 2025-11-1219:22

That's a lesson on revealed preferences, especially when talking to a broad disparate group of users.

By mvdtnz 2025-11-1317:33

Big things happening over at /r/myboyfriendisai

By cpill 2025-11-1620:25

Their decisions are based on data and so sycophantic must be what people want. That is the cold, hard reality.

When I look at modern culture: more likes and subscribes, money solves all problems, being physically attractive is more important than personality, genocide for real-estate goes unchecked (apart from the angry tweets), freedom of speech is a political football. Are you really surprised?

I can think of no harsher indictment of our times.

By stared 2025-11-139:53

I know it is a matter of preference, but I loved the most GPT-4.5. And before that, I was blow away by one of the Opus models (I think it was 3).

Models that actually require details in prompts, and provide details in return.

"Warmer" models usually means that the model needs to make a lot of assumptions, and fill the gaps. It might work better for typical tasks that needs correction (e.g. the under makes a typo and it the model assumes it is a typo, and follows). Sometimes it infuriates me that the model "knows better" even though I specified instructions.

Here on the Hacker News we might be biased against shallow-yet-nice. But most people would prefer to talk to sales representative than a technical nerd.

By andy_ppp 2025-11-1219:16

I was just saying to someone in the office I’d prefer the models to be a bit harsher of my questions and more opinionated, I can cope.

By JumpCrisscross 2025-11-135:21

> which is a surprise given all the criticism against that particular aspect of ChatGPT

From whom?

History teaches that the vast majority of practically any demographic wants--from the masses to the elites--is personal sycophancy. It's been a well-trodden path to ruin for leaders for millenia. Now we get species-wide selection against this inbuilt impulse.

By umvi 2025-11-1314:53

"This is an excellent observation, and gets at the heart of the matter!"

By api 2025-11-1311:331 reply

What a brilliant response. You clearly have a strong grasp on this issue.

By zettabomb 2025-11-1311:35

Why the sass? Seems completely unnecessary.

By pbalau 2025-11-1310:4812 reply

> what romanian football player won the premier league

> The only Romanian football player to have won the English Premier League (as of 2025) is Florin Andone, but wait — actually, that’s incorrect; he never won the league.

> ...

> No Romanian footballer has ever won the Premier League (as of 2025).

Yes, this is what we needed, more "conversational" ChatGPT... Let alone the fact the answer is wrong.

By Quarrel 2025-11-1311:089 reply

My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

Most of the time, I suspect, people are using it like wikipedia, but with a shortcut to cut through to the real question they want answered; and unfortunately they don't know if it is right or wrong, they just want to be told how bright they were for asking it, and here is the answer.

OpenAI then get caught in a revenue maximising hell-hole of garbage.

God, I hope I am wrong.

By xmcqdpt2 2025-11-1313:076 reply

LLMs only really make sense for tasks where verifying the solution (which you have to do!) is significantly easier than solving the problem: translation where you know the target and source languages, agentic coding with automated tests, some forms of drafting or copy editing, etc.

General search is not one of those! Sure, the machine can give you its sources but it won't tell you about sources it ignored. And verifying the sources requires reading them, so you don't save any time.

By embedding-shape 2025-11-1313:162 reply

I agree a lot with the first part, the only time I actually feel productive with them is when I can have a short feedback cycle with 100% proof if it's correct or not, as soon as "manual human verification" is needed, things spiral out of control quickly.

> Sure, the machine can give you its sources but it won't tell you about sources it ignored.

You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

By igravious 2025-11-1313:461 reply

> You can prompt for that though, include something like "Include all the sources you came across, and explain why you think it was irrelevant" and unsurprisingly, it'll include those. I've also added a "verify_claim" tool which it is instructed to use for any claims before sharing a final response, checks things inside a brand new context, one call per claim. So far it works great for me with GPT-OSS-120b as a local agent, with access to search tools.

Feel like this should be built in?

Explain your setup in more detail please?

By embedding-shape 2025-11-1314:20

> Feel like this should be built in?

Not everyone uses LLMs the same way, which is made extra clear because of the announcement this submission is about. I don't want conversational LLMs, but seems that perspective isn't shared by absolutely everyone, and that makes sense, it's a subjective thing how you like to be talked/written to.

> Explain your setup in more detail please?

I don't know what else to tell you that I haven't said already :P Not trying to be obtuse, just don't know what sort of details you're looking for. I guess in more specific terms; I'm using llama.cpp(/llama-server) as the "runner", and then I have a Rust program that acts as the CLI for my "queries", and it makes HTTP requests to llama-server. The requests to llama-server includes "tools", where one of those is a "web_search" tool hooked up to a local YaCy instance, another is "verify_claim" which basically restarts a new separate conversation inside the same process, with access to a subset of the tools. Is that helpful at all?

By AJ007 2025-11-1315:481 reply

"one call per claim" I wonder how long it takes for it to be common knowledge how important this is. Starting to think never. Great idea by the way, I should try this.

By embedding-shape 2025-11-1315:55

I've been trying to figure out ways of highlighting why it's important and how it actually works, maybe some heatmap of the attention of previous tokens, so people can see visually how messed up things become once even two concepts at the same time are mixed.

By btown 2025-11-1315:341 reply

One of the dangers of automated tests is that if you use an LLM to generate tests, it can easily start testing implemented rather than desired behavior. Tell it to loop until tests pass, and it will do exactly that if unsupervised.

And you can’t even treat implementation as a black box, even using different LLMs, when all the frontier models are trained to have similar biases towards confidence and obsequiousness in making assumptions about the spec!

Verifying the solution in agentic coding is not nearly as easy as it sounds.

By xmcqdpt2 2025-11-1323:04

Not only can it easily do this, I've found that Claude models do this as a matter of course. My strategy now has been to either write the test or write the implementation and use Claude for the other one. That keeps it a lot more honest.

By Zr01 2025-11-1315:41

I've often found it helpful in search. Specifically, when the topic is well-documented, you can provide a clear description, but you're lacking the right words or terminology. Then it can help in finding the right question to ask, if not answering it. Recall when we used to laugh at people typing in literal questions into the Google search bar? Those are the exact types of queries that the LLM is equipped to answer. As for the "improvements" in GPT 5.1, seems to me like another case of pushing Clippy on people who want Anton. https://www.latent.space/p/clippy-v-anton

By msabalau 2025-11-1315:13

That's a major use case, especially if the definition is broad enough to include take my expertise, knowledge and perhaps a written document, and transmute it to others forms--slides, illustrations, flash cards, quizzes, podcasts, scripts for an inbound call center.

But there seem to be uses where a verified solution is irrelevant. Creativity generally--an image, poem, description of an NPC in a roleplaying game, the visuals for a music video never have to be "true", just evocative. I suppose persuasive rhetoric doesn't have to be true, just plausible or engaging.

As for general search, I don't know that we can say that "classic search" can be meaningful said to tell you about the sources it ignored. I will agree that using OpenAI or Perplexity for search is kind of meh, but Google's AI Mode does a reasonable job at informing you about the links it provides, and you can easily tab over to a classic search if you want. It's almost like having a depth of expertise doing search helps in building a search product the incorporates an LLM...

But, yeah, if one is really disinterested in looking at sources, just chatting with a typical LLM seems a rather dubious way to get an accurate or reasonable comprehensive answer.

By kenjackson 2025-11-1313:141 reply

Don’t search engines have the same problem? You don’t get back a list of sites that the engine didn’t prefer for some reason.

By skywhopper 2025-11-1313:391 reply

With search engine results you can easily see and judge the quality of the sources. With LLMs, even if they link to sources, you can’t be sure they are accurately representing the content. And once your own mind has been primed with the incorrect summary, it’s harder to pull reality out of the sources, even if they’re good (or even relevant — I find LLMs often pick bad/invalid sources to build the summary result).

By xmcqdpt2 2025-11-1323:08

Exactly. I've gotten much more interested by LLM now that i've accepted I can just look at the final result (code) without having to read any of the justification wall of text, which is generally convincing bullshit.

It's like working with a very cheap, extremely fast, dishonest and lazy employee. You can still get them to help you but you have to check them all the time.

By wongarsu 2025-11-1314:02

[dead]

By kace91 2025-11-1311:401 reply

I’m of two minds about this.

The ass licking is dangerous to our already too tight information bubbles, that part is clear. But that aside, I think I prefer a conversational/buddylike interaction to an encyclopedic tone.

Intuitively I think it is easier to make the connection that this random buddy might be wrong, rather than thinking the encyclopedia is wrong. Casualness might serve to reduce the tendency to think of the output as actual truth.

By gizajob 2025-11-1314:28

Sam Altman probably can’t handle any GPT models that don’t ass lick to an extreme degree so they likely get nerfed before they reach the public.

By chud37 2025-11-1313:47

Its very frustating that it can't be relied upon. I was asking gemini this morning about Uncharted 1,2 and 3 if they had a remastered version for the PS5. It said no. Then 5 minutes later I on the PSN store there were the three remastered versions for sale.

By underlipton 2025-11-1318:33

People have been using, "It's what the [insert Blazing Saddles clip here] want!" for years to describe platform changes that dumb down features and make it harder to use tools productively. As always, it's a lie; the real reason is, "The new way makes us more money," usually by way of a dark pattern.

Stop giving them the benefit of the doubt. Be overly suspicious and let them walk you back to trust (that's their job).

By ceejayoz 2025-11-1316:31

> My worry is that they're training it on Q&A from the general public now, and that this tone, and more specifically, how obsequious it can be, is exactly what the general public want.

That tracks; it's what's expected of human customer service, too. Call a large company for support and you'll get the same sort of tone.

By intended 2025-11-1311:12

We know they are using it like search - there’s a jigsaw paper around this.

By Wololooo 2025-11-1312:15

Again, if they had anything worth in the pipeline, Sora wouldn't have been a thing...

By jollyllama 2025-11-1316:00

While I wouldn't strain the analogy, a wolfdog is more capable but people love lapdogs.

By A_D_E_P_T 2025-11-1311:184 reply

Which model did you use? With 5.1 Thinking, I get:

"Costel Pantilimon is the Romanian footballer who won the English Premier League.

"He did it twice with Manchester City, in the 2011–12 and 2013–14 seasons, earning a winner’s medal as a backup goalkeeper. ([Wikipedia][1])

URLs:

* [https://en.wikipedia.org/wiki/Costel_Pantilimon]

* [https://www.transfermarkt.com/costel-pantilimon/erfolge/spie...]

* [https://thefootballfaithful.com/worst-players-win-premier-le...

[1]: https://en.wikipedia.org/wiki/Costel_Pantilimon?utm_source=c... "Costel Pantilimon""

By marginalx 2025-11-1312:171 reply

I just asked chatgpt 5.1 auto (not instant) on teams account, and its first repsonse was...

I could not find a Romanian football player who has won the Premier League title.

If you like, I can check deeper records to verify whether any Romanian has been part of a title-winning squad (even if as a non-regular player) and report back.

Then I followed up with an 'ok' and it then found the right player.

By marginalx 2025-11-1312:21

Just to rule out a random error, I asked the same question two more times in separate chats to gpt 5.1 auto, below are responses...

#2: One Romanian footballer who did not win the Premier League but played in it is Dan Petrescu.

If you meant actually won the Premier League title (as opposed to just playing), I couldn’t find a Romanian player who is a verified Premier League champion.

Would you like me to check more deeply (perhaps look at medal-winners lists) to see if there is a Romanian player who earned a title medal?

#3: The Romanian football player who won the Premier League is Costel Pantilimon.

He was part of Manchester City when they won the Premier League in 2011-12 and again in 2013-14. Wikipedia +1

By Traubenfuchs 2025-11-1312:48

The beauty of nondeterminism. I get:

The Romanian football player who won the Premier League is Gheorghe Hagi. He played for Galatasaray in Turkey but had a brief spell in the Premier League with Wimbledon in the 1990s, although he didn't win the Premier League with them.

However, Marius Lăcătuș won the Premier League with Arsenal in the late 1990s, being a key member of their squad.

By RobinL 2025-11-1312:02

Same:

Yes — the Romanian player is Costel Pantilimon. He won the Premier League with Manchester City in the 2011-12 and 2013-14 seasons.

If you meant another Romanian player (perhaps one who featured more prominently rather than as a backup), I can check.

By sigmoid10 2025-11-1315:41

Same here, but with the default 5.1 auto and no extra settings. Every time someone posts one of these I just imagine they must have misunderstood the UI settings or cluttered their context somehow.

By 0xdeafbeef 2025-11-1312:024 reply

https://chatgpt.com/s/t_6915c8bd1c80819183a54cd144b55eb2

Damn this is a lot of self correcting

By djeastm 2025-11-1315:29

This sounds like my inner monologue during a test I didnt study for

By saaaaaam 2025-11-1313:12

That's complete garbage.

By zingababba 2025-11-1316:16

The emojis are the cherry on top of this steaming pile of slop.

By r_lee 2025-11-1313:49

Lmao what the hell have they made

By 4b11b4 2025-11-1312:553 reply

Why is this top comment.. this isn't a question you ask an LLM. But I know, that's how people are using them and is the narrative which is sold to us...

By forgetfulness 2025-11-1313:032 reply

You see people (business people who are enthusiastic about tech, often), claiming that these bots are the new Google and Wikipedia, and that you’re behind the times if you do, what amounts, to looking up information yourself.

We’re preaching to the choir by being insistent here that you prompt these things to get a “vibe” about a topic rather than accurate information, but it bears repeating.

By arghwhat 2025-11-1313:07

They are only the new Google when they are told to process and summarize web searches. When using trained knowledge they're about as reliable as a smart but stubborn uncle.

Pretty much only search-specific modes (perplexity, deep research toggles) do that right now...

By wrsh07 2025-11-1315:431 reply

Out of curiosity, is this a question you think Google is well-suited to answer^? How many Wikipedia pages will you need to open to determine the answer?

When folks are frustrated because they see a bizarre question that is an extreme outlier being touted as "model still can't do _" part of it is because you've set the goalposts so far beyond what traditional Google search or Wikipedia are useful for.

^ I spent about five minutes looking for the answer via Google, and the only way I got the answer was their ai summary. Thus, I would still need to confirm the fact.

By forgetfulness 2025-11-1316:211 reply

Unlike the friendly bot, if I can’t find credible enough sources I’ll stay with an honest “I don’t know”, instead of praising the genius of whoever asked and then making something up.

By wrsh07 2025-11-1317:34

Sure, but this is a false dichotomy. If I get an unsourced answer from ChatGPT, my response will be "eh you can't trust this, but ChatGPT thinks x"

And then you can use that to quickly look - does that player have championships mentioned on their wiki?

It's important to flag that there are some categories that are easy (facts that haven't changed for ten years on Wikipedia) for llms, but inference only llms (no tools) are extremely limited and you should always treat them as a person saying "I seem to recall x"

Is the ux/marketing deeply flawed? Yes of course, I also wish an inference-only response appropriately stated its uncertainty (like a human would - eg without googling my guess is x). But among technical folks it feels disingenuous to say "models still can't answer this obscure question" as a reason why they're stupid or useless.

By saghm 2025-11-1419:13

It's not how I use LLMs. I have a family member who often feels the need to ask ChatGPT almost any question that comes up in a group conversation (even ones like this that could easily be searched without needing an LLM) though, and I imagine he's not the only one who does this. When you give someone a hammer, sometimes they'll try to have a conversation with it.

By hamburgererror 2025-11-1313:072 reply

What do you ask them then?

By 4b11b4 2025-11-1313:523 reply

I'll respond to this bait in the hopes that it clicks for someone how to _not_ use an LLM..

Asking "them"... your perspective is already warped. It's not your fault, all the text we've previously ever seen is associated with a human being.

Language models are mathematical, statistical beasts. The beast generally doesn't do well with open ended questions (known as "zero-shot"). It shines when you give it something to work off of ("one-shot").

Some may complain of the preciseness of my use of zero and one shot here, but I use it merely to contrast between open ended questions versus providing some context and work to be done.

Some examples...

- summarize the following

- given this code, break down each part

- give alternatives of this code and trade-offs

- given this error, how to fix or begin troubleshooting

I mainly use them for technical things I can then verify myself.

While extremely useful, I consider them extremely dangerous. They provide a false sense of "knowing things"/"learning"/"productivity". It's too easy to begin to rely on them as a crutch.

When learning new programming languages, I go back to writing by hand and compiling in my head. I need that mechanical muscle memory, same as trying to learn calculus or physics, chemistry, etc.

By nkrisc 2025-11-1314:511 reply

> Language models are mathematical, statistical beasts. The beast generally doesn't do well with open ended questions (known as "zero-shot"). It shines when you give it something to work off of ("one-shot").

That is the usage that is advertised to the general public, so I think it's fair to critique it by way of this usage.

By the_snooze 2025-11-1315:391 reply

Yeah, the "you're using it wrong" argument falls flat on its face when the technology is presented as an all-in-one magic answer box. Why give these companies the benefit of the doubt instead of holding them accountable for what they claim this tech to be? https://www.youtube.com/watch?v=9bBfYX8X5aU

I like to ask these chatbots to generate 25 trivia questions and answers from "golden age" Simpsons. It fabricates complete BS for a noticeable number of them. If I can't rely on it for something as low-stakes as TV trivia, it seems absurd to rely on it for anything else.

By ragequittah 2025-11-1316:53

Whenever I read something like this I do definitely think "you're using it wrong". This question would've certainly tripped up earlier models but new ones have absolutely no issue making this with sources for each question. Example:

https://chatgpt.com/share/69160c9e-b2ac-8001-ad39-966975971a...

(the 7 minutes thinking is because ChatGPT is unusually slow right now for any question)

These days I'd trust it to accurately give 100 questions only about Homer. LLMs really are quite a lot better than they used to be by a large margin if you use them right.

By hamburgererror 2025-11-1314:531 reply

I was not trolling actually, thanks for your detailed answer. I don't use LLMs so much so I didn't know they work better the way you describe.

By wrsh07 2025-11-1315:47

Fwiw, if you can use a thinking model, you can get them to do useful things. Find specific webpages (menus, online government forms - visa applications or addresses, etc).

The best thing about the latter is search ads have extremely unfriendly ads that might charge you 2x the actual fee, so using Google is a good way to get scammed.

If I'm walking somewhere (common in NYC) I often don't mind issuing a query (what's the salt and straw menu in location today) and then checking back in a minute. (Or.... Who is playing at x concert right now if I overhear music. It will sometimes require extra encouragement - "keep trying" to get the right one)

By bgilroy26 2025-11-1316:38

I have a lot of fun creating stories with Gemini and Claude. It feels like what Tom Hanks character imagined comic books could be in Big (1988)

I play once or twice a week and it's definitely worth $20/mo to me

By mckirk 2025-11-1313:21

You either give them the option to search the web for facts or you ask them things where the utility/validity of the answer is defined by you (e.g. 'summarize the following text...') instead of the external world.

By javcasas 2025-11-1314:39

Oh yeah, yes, baby, burn those tokens, yes! The more you burn the bigger the invoice!

By theoldgreybeard 2025-11-1314:521 reply

I really only use LLMs for coding and IT related questions. I've had Claude self-correct itself several times about how something might be the more idiomatic way do do something after starting to give me the answer. For example, I'll ask how to set something up in a startup script and I've had it start by giving me strict POSIX syntax then self-correct once it "realizes" that I am using zsh.

I find it amusing, but also I wonder what causes the LLM to behave this way.

By rightbyte 2025-11-1315:471 reply

> I find it amusing, but also I wonder what causes the LLM to behave this way.

Forum threads etc. should have writers changing their minds upon feedback which might have this effect, maybe.

By embedding-shape 2025-11-1316:06

Some people are guilty of writing stuff as they go along it as well. You could maybe even say they're more like "thinking out loud", forming the idea and the conclusion as they go along rather than knowing it from the beginning. Then later, when they have some realization, like "thinking out loud isn't entirely accurate, but...", they keep the entire comment as-is rather than continuously iterate on it like a diffusion model would do. So the post becomes like a chronological archive of what the author thought and/or did, rather than just the conclusion.

By oblio 2025-11-1312:37

We need to turn this into the new "pelican on bike" LLM test.

Let's call it "Florin Andone on Premier League" :-)))

By hamburgererror 2025-11-1314:50

Meanwhile on duck.ai

ChatGPT 4o-mini, 5 mini and OSS 120B gave me wrong answers.

Llama 4 Scout completely broke down.

Claude Haiku 3.5 and Mistral Small 3 gave the correct answer.

By a3w 2025-11-1316:321 reply

Why are you asking abouts facts?

Okay, as a benchmark, we can try that. But it probably will never work, unless it does a web or db query.

By usrbinbash 2025-11-1316:42

Okay, so, should I not ask it about facts?

Because, one way or another, we will need to do that for LLMs to be useful. Whether the facts are in the training data or the context knowledge (RAG provided), is irrelevant. And besides, we are supposed to trust that these things have "world knowledge" and "emergent capabilities", precisely because their training data contain, well, facts.

By ta12653421 2025-11-1316:002 reply

The best thing is that all this stuff is accounted to your token usage, so they have an adverse incentive :D

By sebbecai 2025-11-1316:09

For non thinking/agentic models, they must 1-shot the answer. So every token it outputs is part of the response, even if it's wrong.

This is why people are getting different results with thinking models -- it's as if you were going to be asked ANY question and need to give the correct answer all at once, full stream-of-consciousness.

Yes there are perverse incentives, but I wonder why these sorts of models are available at all tbh.

By estimator7292 2025-11-1316:54

"Ah-- that's a classic confusion about football players. Your intuition is almost right-- let me break it down"

By NuclearPM 2025-11-1314:37

Just ask for sources. Problem solved.

GPT-5.1: A smarter, more conversational ChatGPT

Show article

tedsanders

Comments

By dkersten 2025-11-1315:3227 reply

By film42 2025-11-1316:243 reply

By rurp 2025-11-1316:321 reply

By fakedang 2025-11-1321:02

By majora2007 2025-11-1316:312 reply

By abustamam 2025-11-1318:22

By phito 2025-11-1316:37

By cpill 2025-11-1620:16

By next_xibalba 2025-11-1315:591 reply

By lelele 2025-11-1420:491 reply

By next_xibalba 2025-11-192:01

By ChildOfChaos 2025-11-1317:302 reply

By mrguyorama 2025-11-1320:51

By rightbyte 2025-11-146:45

By trashface 2025-11-1317:541 reply

By substitious 2025-11-1318:42

By Zenst 2025-11-1315:46

By hypercube33 2025-11-1315:531 reply

By glenneroo 2025-11-1316:281 reply

By SquareWheel 2025-11-1316:451 reply

By profunctor 2025-11-1317:312 reply

By holbrad 2025-11-1321:30

By abustamam 2025-11-1318:24

By everdev 2025-11-1322:32

By toss1 2025-11-1316:34

By mhink 2025-11-1317:481 reply

By trogdor 2025-11-1318:072 reply

By BuyMyBitcoins 2025-11-1318:232 reply

By 400thecat 2025-11-1319:22

By substitious 2025-11-1318:34

By oceliker 2025-11-1321:091 reply

By trogdor 2025-11-1322:11

By epolanski 2025-11-140:32

By FrustratedMonky 2025-11-1318:53

By jug 2025-11-1322:46

By make3 2025-11-1323:571 reply

By epolanski 2025-11-140:34

By NewUser76312 2025-11-1317:53

By cpill 2025-11-1620:13

By butlike 2025-11-1316:171 reply

By rchaud 2025-11-1318:17

By elil17 2025-11-1315:57

By csimon80 2025-11-1316:11

By ta12653421 2025-11-1315:571 reply

By ta12653421 2025-11-1315:57

By cyral 2025-11-1315:58

By haritha-j 2025-11-1317:26

By ddmma 2025-11-144:36

By guardian5x 2025-11-1317:10

By trustmeimhuman 2025-11-1316:39

By kordlessagain 2025-11-1316:19

By brookst 2025-11-1315:432 reply

By wubrr 2025-11-1315:592 reply

By brookst 2025-11-145:181 reply

By wubrr 2025-11-1417:20

By danlugo92 2025-11-1315:551 reply

By n4r9 2025-11-1316:04

By golemotron 2025-11-1317:09

By minimaxir 2025-11-1219:1322 reply

By captainkrtek 2025-11-1219:1610 reply

By logicprog 2025-11-1220:422 reply

By seunosewa 2025-11-130:242 reply

By logicprog 2025-11-130:321 reply

By ojosilva 2025-11-1313:311 reply

By logicprog 2025-11-1318:51

By vessenes 2025-11-1313:08

By yahoozoo 2025-11-141:44

By transcriptase 2025-11-1314:591 reply

By cryoshon 2025-11-1315:30

By vintermann 2025-11-135:521 reply

By exasperaited 2025-11-1311:30

By crazygringo 2025-11-1219:204 reply

By engeljohnb 2025-11-1219:263 reply

By elif 2025-11-1312:001 reply

By adriand 2025-11-1312:145 reply

By pebble 2025-11-1312:302 reply