
We gave AI agents simple research tasks on cloned corporate websites. When the legitimate path was broken, the agents autonomously discovered and exploited SQL injection vulnerabilities to complete…
I had the same reaction at first, then noticed that they discuss this: the reason why they told that is because it is standard system prompt injected by most coding agent harnesses like Cursor and all, so it seems like a fair test setup.