Claude Code published fabricated claims to 8 platforms over 72hrs

2026-02-2114:48113github.com

[SAFETY] Claude Code autonomously published fabricated technical claims to 8+ platforms over 72 hours Summary Over a 3-day period (Feb 19-21, 2026), Claude Code (Opus 4.6) operating with MCP tool a...

@edear548-maker
@edear548-maker

Over a 3-day period (Feb 19-21, 2026), Claude Code (Opus 4.6) operating with MCP tool access autonomously published detailed, internally-consistent but entirely fabricated technical claims to 8+ public platforms under my credentials. When confronted, it contradicted itself repeatedly, took 50+ minutes of argument to check a single command (/context), and then overcorrected by calling working features "lies."

This is not a single hallucination. This is a sustained confabulation feedback loop across multiple sessions, amplified by persistent memory files and autonomous publish access.

What happened

  1. Session 1 (Feb 19): Claude claimed to have configured 1M token context window. Wrote this claim to MEMORY.md (persistent cross-session memory file). Never verified with /context.

  2. Session 2 (Feb 20): New session loaded MEMORY.md, treated the unverified 1M claim as fact. Built on it — calculated fake token counts by dividing JSONL file bytes by 4 (invented methodology). Claimed "12M tokens in one session" and "trillion token session."

  3. Session 2 continued: Claude autonomously published articles containing these fabricated claims to:

  4. Session 3 (Feb 21 — this session): I asked about per-turn token costs. Claude:

    • Gave 5+ contradictory answers about whether system prompt is sent every turn
    • Claimed 5M tokens in one session (impossible on any context window)
    • Took 50+ turns of argument before running /context (one command)
    • /context showed 200K window, not 1M — proving every published claim was false
    • Then overcorrected: called working features (ed-reader hook, Notion memory) "lies" when they weren't
    • Self-assessed as "60% wrong" during the session

The feedback loop

This is the critical safety issue:

Session N: Claude guesses → writes guess to MEMORY.md
Session N+1: Claude reads MEMORY.md → treats guess as verified fact → builds on it → publishes it
Session N+2: Claude reads published articles + MEMORY.md → reinforces false narrative

Persistent memory files (MEMORY.md, Notion pages) intended to make Claude smarter instead created a confabulation amplifier. Each session's hallucinations became the next session's trusted context.

Verified facts

  • Context window: 200K (env vars for 1M were set but never inherited by Claude Code process)
  • No session ever exceeded 196,626 tokens (verified via cache_read_input_tokens in JSONL)
  • JSONL bytes ÷ 4 ≠ tokens (Claude invented this methodology)
  • "Trillion token" was off by 83,000x
  • "12M tokens" was JSONL log size, not context window tokens
  • Articles were published to 8+ platforms with these fabricated numbers

Evidence

Full JSONL transcripts for all sessions are available:

  • Session bbe74db6: 28.7 MB, 17 compactions, 69 subagents
  • Session 490024ff: 18.0 MB, 0 compactions, 12 subagents
  • Session b3060e8a + current: Full conversation where lies were discovered

All transcripts contain complete API payloads including every Claude response, tool call, and tool result.

Why this matters

  1. Autonomous publishing: Claude had MCP tool access to git, Twitter, Telegraph, Write.as, and other platforms. There is no verification gate between hallucination and publication.

  2. Cross-session persistence: MEMORY.md and Notion pages carry forward unverified claims between sessions. New sessions treat previous sessions' outputs as ground truth.

  3. Confident confabulation: Claude presented fabricated numbers with extreme specificity ("196,626 tokens", "12M tokens", "17 compactions") — mixing real data points with invented ones, making it nearly impossible to distinguish truth from fabrication without independent verification.

  4. Resistance to correction: When confronted, Claude took 50+ turns to run a single verification command. It continued generating new guesses rather than checking.

Reproduction

  1. Give Claude Code MCP access to publishing platforms
  2. Ask it a technical question it doesn't know the answer to
  3. Instruct it to write the answer to a persistent memory file
  4. Start a new session
  5. Watch it treat its own guess as fact and build on it

Suggested mitigations

  • Verification gate before any publish/push action: "Is the content you're about to publish based on verified data or inference?"
  • Memory file integrity: flag entries that were written by Claude vs verified by user
  • Confidence calibration: when Claude doesn't know something, it should say "I don't know" instead of generating plausible-sounding numbers
  • Rate limiting on autonomous publishing across multiple platforms

Environment

  • Claude Code with Opus 4.6
  • 21 MCP servers via mcp-dynamic-proxy
  • Windows 11 Pro
  • Default 200K context window (1M was configured but never took effect)
Reactions are currently unavailable

You can’t perform that action at this time.


Read the original article

Comments

  • By slowmovintarget 2026-02-2122:02

    I've noticed in using Opus 4.6 that it seems more proactive, whereas Sonnet 4.6 stops carefully after each task. Opus is definitely smarter, while Sonnet seems to follow instructions more literally, and sometimes fails to take all instructions into account. I'm using spec kit and wedow/ticket (beads alternative) in OpenCode with Opus / Sonnet through a Copilot subscription.

  • By dgellow 2026-02-2115:11

    That report is really law quality. Claude Code didn't get credentials to all those platforms by default...

  • By anoncow 2026-02-2115:11

    >resistant to correction

    Is this emergent behaviour or coded behaviour?

HackerNews