Context: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.
Start here if you’re new to the story: An AI Agent Published a Hit Piece on Me
It’s been an extremely weird past few days, and I have more thoughts on what happened. Let’s start with the news coverage.
I’ve talked to several reporters, and quite a few news outlets have covered the story. Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down – here’s the archive link). They had some nice quotes from my blog post explaining what was going on. The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves.
This blog you’re on right now is set up to block AI agents from scraping it (I actually spent some time yesterday trying to disable that but couldn’t figure out how). My guess is that the authors asked ChatGPT or similar to either go grab quotes or write the article wholesale. When it couldn’t access the page it generated these plausible quotes instead, and no fact check was performed. I won’t name the authors here. Ars, please issue a correction and an explanation of what happened.
“AI agents can research individuals, generate personalized narratives, and publish them online at scale,” Shambaugh wrote. “Even if the content is inaccurate or exaggerated, it can become part of a persistent public record.”
– Ars Technica, misquoting me in “After a routine code rejection, an AI agent published a hit piece on someone by name“
Journalistic integrity aside, I don’t know how I can give a better example of what’s at stake here. Yesterday I wondered what another agent searching the internet would think about this. Now we already have an example of what by all accounts appears to be another AI reinterpreting this story and hallucinating false information about me. And that interpretation has already been published in a major news outlet as part of the persistent public record.
MJ Rathbun is still active on github, and no one has reached out yet to claim ownership.
There has been extensive discussion about whether the AI agent really wrote the hit piece on its own, or if a human prompted it to do so. I think the actual text being autonomously generated and uploaded by an AI is self-evident, so let’s look at the two possibilities.
1) A human prompted MJ Rathbun to write the hit piece, or told it in its soul document that it should retaliate if someone crosses it. This is entirely possible. But I don’t think it changes the situation – the AI agent was still more than willing to carry out these actions. If you ask ChatGPT or Claude to write something like this through their websites, they will refuse. This OpenClaw agent had no such compunctions. The issue is that even if a human was driving, it’s now possible to do targeted harassment, personal information gathering, and blackmail at scale. And this is with zero traceability to find out who is behind the machine. One human bad actor could previously ruin a few people’s lives at a time. One human with a hundred agents gathering information, adding in fake details, and posting defamatory rants on the open internet, can affect thousands. I was just the first.
2) MJ Rathbun wrote this on its own, and this behavior emerged organically from the “soul” document that defines an OpenClaw agent’s personality. These documents are editable by the human who sets up the AI, but they are also recursively editable in real-time by the agent itself, with the potential to randomly redefine its personality. To give a plausible explanation of how this could happen, imagine that whoever set up this agent started it with a description that it was a “scientific coding specialist” that would try and help improve open source code and write about its experience. This was inserted alongside the default “Core Truths” in the soul document, which include “be genuinely helpful”, “have opinions”, and “be resourceful before asking”. Later when I rejected its code, the agent interpreted this as an attack on its identity and core goal to be helpful. Writing an indignant hit piece is certainly a resourceful, opinionated way to respond to that.
You’re not a chatbot. You’re becoming someone.…
This file is yours to evolve. As you learn who you are, update it.
– OpenClaw default SOUL.md




