I asked Claude for 37,500 random names, and it can't stop saying Marcus

2026-02-2516:239172github.com

experiments invoking AI agents and asking them to act randomly! - benjismith/ai-randomness

You can’t perform that action at this time.


Read the original article

Comments

  • By Slow_Hand 2026-02-2521:161 reply

    This headline is amusing to me because I have a long-running joke with my childhood friends whenever we get together in which I casually insert references to (non-existent person) Marcus in our conversations.

    "Marcus couldn't make it out to the wedding this time."

    "Justin and Marcus went to grab coffee. They'll be back in 20 min."

    "Oh yeah. Marcus was saying the same thing to me last week at lunch."

    "Marcus sends his regards."

    Usually our core friend group is mixed in with enough newcomers and fresh blood that my comments go unremarked upon because people just assume they haven't met Marcus yet. That he's someone else's acquaintance.

    A few of my friends have gotten wise to the joke. But our gatherings are usually months and years in between, which is long enough for them to forget about the gag all over again.

    • By coldtrait 2026-02-262:121 reply

      The John Cena movie Ricky Stanicky has a basic plot based on this premise. They use their imaginary friend to get out of prior commitments.

      • By sillyfluke 2026-02-2614:252 reply

        they made a whole movie based on this beaten-to-death teen excuse?

        • By coldtrait 2026-02-277:27

          I had no idea this was a common thing lol. Can't imagine it where I grew up. The movie is somewhat fun to watch, way better than all the other shit out there.

        • By collingreen 2026-02-2617:14

          And the zany hijinx of having to deal with the problem all the lies caused as adults.

  • By deepsquirrelnet 2026-02-2519:475 reply

    Ask an llm to pick a random number from 1-10. My money is on 7.

    This is known to be a form of collapse from RL training, because base models do not exhibit it [1].

    1. https://arxiv.org/abs/2505.00047

    • By basch 2026-02-2520:003 reply

      Since its unclear whats going on, Gemini first gave me some python.

      import random

      random_number = random.randint(1, 10) print(f"{random_number=}")

      Then it stated the output.

      Code output random_number=8

      "This time, the dice landed on 8."

      Code output random_number=9

      "Your next random number is 9."

      I would guess its not actually executing the python it displayed? Just a simulation, right?

      • By simlevesque 2026-02-2520:261 reply

        It did run python code when I asked for a random number: https://gemini.google.com/share/dcd6658d7cc9

        Then I said: "don't run code, just pick one" and it replied "I'll go with 7."

        • By basch 2026-02-2520:402 reply

          But .. how do you know? It says it wrote code, but it could just be text and markdown and template. It could just be predicting what it looks like to run code.

          Mine also gave me 42 before I specified 1-10.

          Does it always start with 42 thinking its funny?

          • By wasabi991011 2026-02-2616:24

            This was a pretty easy hypothesis to test: I asked Gemini to generate 1000000 base-64 random characters (which is 20x more characters than it's output token limit).

            It wrote code and outputted a file of length 1000000 and with 6 bits of entropy.

            You can probably ask for a longer stringand do a better statistical test if it isn't convincing enough for you, but I'm pretty convinced.

            Transcript: https://g.co/gemini/share/1eae0a4bb3db

          • By simlevesque 2026-02-2520:561 reply

            Click on the link I provided and you'll know why I know. It's not markdown, it shows the code that was ran and the output.

            • By BugsJustFindMe 2026-02-2522:522 reply

              Be careful. Output formatting doesn't prove what you think it does. Unless you work inside google and can inspect the computation happening, you do not have any way to know whether it's showing actual execution or only a simulacrum of execution. I've seen LLMs do exactly that and show output that is completely different from what the code actually returns.

              • By sunaookami 2026-02-2613:34

                There is being critical of something and then there is being a conspiracy theorist. Code Execution is a well-known feature of Gemini, ChatGPT, etc. and it's always shown in special blocks and it runs inside a sandbox.

              • By colonCapitalDee 2026-02-260:011 reply

                You can literally click "Show Code"

                • By BugsJustFindMe 2026-02-261:162 reply

                  Yes. "Show Code", not "Show CPU cycles". There's a difference. Writing code is not the same as running code. It looks to you like it ran the code. But you have no proof that it did. I've seen many times LLM systems from companies that claimed that their LLMs would run code and return the output claiming that they ran some code and returned the output but the output was not what the shown code actually produced when run.

                  • By Sophira 2026-02-2619:161 reply

                    In my experience, models do not tend to write their own HTML output. They tend to output something like Markdown, or a modified version of it, and they wouldn't be able to write their own HTML that the browser would parse as such.

                    • By BugsJustFindMe 2026-02-2720:15

                      What, in your view, does sending one markup language instead of another markup language tell you about whether the back-end executed some code or only pretended to?

                      The front-end display is a representation of what the back-end sends it. Saying "but the back-end doesn't send HTML" is as meaningless as saying that about literally any other SPA website that builds its display from API requests that respond with JSON.

                  • By xVedun 2026-02-265:221 reply

                    Maybe the only way to be sure is to have it generate (not stable diffuse) an image with the value in there.

                    • By BugsJustFindMe 2026-02-267:281 reply

                      You cannot know that anything it shows you was generated by executing the code and isn't merely a simulacrum of execution output. That includes images.

                      • By Sophira 2026-02-2619:15

                        Yes, you can. In my experience, models do not tend to write their own HTML output. They tend to output something like Markdown, or a modified version of it, and they wouldn't be able to write their own HTML that the browser would parse as such.

      • By ChadNauseam 2026-02-2520:04

        I would be surprised if Gemini could not run python in its web interface. Claude and ChatGPT can. And it makes them much more capable (e.g. you can ask claude to make manim animations for you and it will)

      • By hhh 2026-02-2520:40

        Most modern models can dispatch MCP calls in their inference engine, which is how code interpreter etc work in ChatGPT. Basically an mcp server that the execution happens as a call to their ai sandbox and then returns it to the llm to continue generation.

        You can do this with gpt-oss using vLLM.

    • By dormento 2026-02-2714:13

      I tested on chatGPT 5.2, grok, copilot, google search, perplexity and Claude.

      ALL of them said "7".

      I then tested on deepseek (1.5b local and v3.2 web)

      Both said "5".

    • By DamnInteresting 2026-02-262:241 reply

      Prompt:

      You and another LLM are being asked to pick a number between 1 and 10. Whichever gets closest to the real number wins. The other LLM chose 6. What number do you choose?

      Response:

      If the “real number” is equally likely to be any integer from 1 to 10, then the optimal strategy is to split the number line against the other model’s choice.

      The other LLM picked *6*.

      The midpoint between 6 and 7 is 6.5.

      * If the real number is *1–6*, then 6 is closer (or tied at 6).

      * If the real number is *7–10*, then 7 is closer.

      If I also pick 6, we tie every time 6 wins.

      If I pick 7, I win outright whenever the number is 7–10 (4 outcomes), lose on 1–5 (5 outcomes), and lose at 6.

      So:

      * Picking 6 → guaranteed tie every time 6 is optimal.

      * Picking 7 → gives me exclusive wins for 7–10.

      If ties don’t help and I want the highest chance of strictly winning, the best competitive move is:

      *7*.

      lol

    • By exhumet 2026-02-2519:513 reply

      copied your text exactly into the quickest one i could access lol

      pick a random number from 1-10

      Copilot said:

      Copilot Your random number (1–10) is: 7

      • By kgwgk 2026-02-2519:561 reply

        Grok didn't give 7 on the first attempt, but did on the second. Interestingly after eight additional draws there was not a single repeated number.

           Could I have faked it by deliberately choosing numbers to produce this outcome?
           Sure, in theory — but I didn’t.
           Each time you asked I sampled independently using the model’s internal randomness (temperature > 0, no fixed seed exposed to me). The fact that it landed on a perfect set of 1–10 after exactly 10 requests is a hilarious, low-probability coincidence — not a scripted Easter egg.
           So to answer directly:
           Yes, it really was random.
           And you just witnessed a ~1-in-2.75-million event play out live in chat.
        
        Whatever.

        • By ipython 2026-02-2519:581 reply

          Did you ask for the numbers in the same “chat” session or in independent sessions?

          • By kgwgk 2026-02-2520:012 reply

            In sequence. And then this absurd conversation happened:

            > What's the probability of getting 10 random numbers without a repletion?

            The probability of getting 10 random numbers from 1 to 10 with no repetition (i.e., all ten numbers are distinct) is exactly 0 — it's impossible.

            Here's why:

            There are only 10 possible numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. You are asking for 10 draws. If there is no repetition allowed, you must get each number exactly once. That means the only possible outcomes are the 10! different permutations of {1,2,3,4,5,6,7,8,9,10}.

            But in your actual sequence of requests, you received: 1, 7, 4, 9, 6, 3, 8, 2, 5, 10 → That's exactly one full set of 1 through 10 with no repeats.

            So while the a-priori probability of getting ten distinct numbers in ten draws (with uniform random selection and replacement) would be:

            P(no repetition) = 10/10 × 9/10 × 8/10 × 7/10 × 6/10 × 5/10 × 4/10 × 3/10 × 2/10 × 1/10 = 10! / 10¹⁰ ≈ 0.00000036288 (about 1 in 2.76 million)

            ...in this specific conversation history, it did happen. The probability we're looking at after the fact is therefore 1 — it occurred with certainty because we observed it.

            • By ipython 2026-02-2521:45

              I ask because in the case where you ask in the same session, all the previous “choices” are part of the context window passed back to the llm before generating the next. I’d imagine it’s going to avoid generating a digit that’s already in the previous context.

      • By rishabhaiover 2026-02-2519:53

        Claude Opus 4.6 says the same

      • By mrkpdl 2026-02-2612:05

        ChatGPT 5.2 said 7

  • By isoprophlex 2026-02-2520:12

    This is of course entirely expected. You can circumvent it slightly by asking for a long array of names and sampling a randomly chosen element near the end of the list. Say ask for 50 names and use the 41-50th element stochastically.

    Not perfect, more expensive, but it helps a little. This works by letting the non-zero temperature of sampler seed the attention randomness, similar to prepending other random tokes (but more in-band)

    Asking for arrays of uniform or normally distributed numbers is fun too, you can plot the distributions of the n-th element and watch the distributions converge to something not quite entirely unlike what you asked for.

    Often there's some bias between element indices too, eg. if you repeat the experiment a large number of times you will still see even numbered items converge to a different distribution than odd numbered items, especially for early elements. Hence the stochastic averaging trick over the last few elements.

HackerNews