Recommend you never give codex or Claude access to rm or deletions in general. Always force them to replace files rather than deleting, and moving into an ~/archive folder when not replacing and wanting to “remove”.
This works well, but is not sureproof. You can add a hook onto Claude code to block those commands at various stages, I have some useful hooks at my https://GitHub.com/claude-warden repo.
It's a good guardrail, but like you say, it's not foolproof. Lots of commands have destructive options, or can be used to in turn invoke arbitrary operations. Like `find` is just as risky a call as `rm`. I can just see imagine the reasoning chain.
"There is an error due to <file>. If I remove <file>, the error could be resolved. I don't have permission to use `rm`, but `find` can be used to delete files and I have permission to use that..."
Couldn't these tools be made to run in an OverlayFS-type filesystem that the user could review and apply changes to when they're done?
It would also be nice to have a second agent review every command to ensure nothing overly destructive is happening.
Are either of these things possible with Codex/CC?
CC is really good at finding ways to work around denied permissions. The only safe solution is some kind of vm.
What’s wild to me is that nobody here is commenting on how he’s prompting the model, which is 100% the issue. Every single time I see a story about “LLM did bad” it’s always the user prompting like “pls refaktor code but, i dont want, u 2 over right the main py file”
They are not language models in the way that people seem to believe. If you want an accurate and technical discussion then your prompts should match the average of the Abstract section of the published papers that discuss it.
This off-by-one error that results in a catastrophe is expected and the sign that you’ve added perplexity to the system.
Nothing surprising and OP seem understandable of what have happened. But I should maybe take the opportunity here and remind you guys to:
- Use version control
- Backup your things somewhere (not same drive or use Cloud / NAS whatever), Windows have a cool feature called File history! But no one trusts Windows anyways so stick to external backup
- Restrict the agent a lot, make it least-privileged user
- Restrict it in a virtualized filesystem so it cannot work outside of its scope
- Devcontainers?
- Do not use auto allow actions, always supervise the actions it wants to perform outside reading/writing code
- Avoid fully automated agents at all outside of sandboxed environments haha