Run NanoClaw in Docker Sandboxes

2026-03-1313:4514353nanoclaw.dev

Every agent gets its own isolated container inside a micro VM. No dedicated hardware needed. No complex setup.

We announced today that we’ve partnered with Docker to enable running NanoClaw in Docker Sandboxes with one command. You can read Docker’s blog post here.

Get Started

# macOS (Apple Silicon)
curl -fsSL https://nanoclaw.dev/install-docker-sandboxes.sh | bash

# Windows (WSL)
curl -fsSL https://nanoclaw.dev/install-docker-sandboxes-windows.sh | bash

This handles the clone, setup, and Docker Sandbox configuration. You can also install manually from source.

Note: Docker Sandboxes are currently supported on macOS (Apple Silicon) and Windows (x86), with Linux support rolling out in the coming weeks.

Once it’s running, every agent gets its own isolated container inside a micro VM. No dedicated hardware needed. No complex setup.

How It Works

Docker Sandboxes run agents inside lightweight micro VMs, each with its own kernel, its own Docker daemon, and no access to your host system. This goes beyond container isolation: hypervisor-level boundaries with millisecond startup times.

NanoClaw maps onto this architecture naturally:

DOCKER SANDBOX (micro VM) Docker daemon (isolated) hypervisor-level isolation boundary Agent: #sales (Slack channel) Own filesystem Own context / memory Access: CRM, sales playbooks Tools: email, calendar Agent: #support (Slack channel) Own filesystem Own context / memory Access: docs, ticket system Tools: knowledge base, Jira Agent: #personal (WhatsApp) Own filesystem Own context / memory Access: personal calendar Tools: reminders, notes × ×

Each NanoClaw agent runs in its own container with its own filesystem, context, tools, and session. Your sales agent can’t see your personal messages. Your support agent can’t access your CRM data. These are hard boundaries enforced by the OS, not instructions given to the agent.

The micro VM layer adds a second boundary. Even if an agent somehow broke out of its container, it hits the VM wall. Your host machine, your files, your credentials, your other applications are on the other side of a hard isolation boundary.

The Security Model: Design for Distrust

I wrote about this in Don’t Trust AI Agents: when you’re building with AI agents, they should be treated as untrusted and potentially malicious. Prompt injection, model misbehavior, things nobody’s thought of yet. The right approach is architecture that assumes agents will misbehave and contains the damage when they do.

That principle drives every design decision in NanoClaw. Don’t put secrets or credentials inside the agent’s environment. Give the agent access to exactly the data and tools it needs for its job, nothing more. Keep everything else on the other side of a hard boundary.

With Docker Sandboxes, that boundary is now two layers deep. Each agent runs in its own container (can’t see other agents’ data), and all containers run inside a micro VM (can’t touch your host machine). If a hallucination or a misbehaving agent can cause a security issue, the security model is broken. Security has to be enforced outside the agentic surface, not depend on the agent behaving correctly.

OpenClaw runs on your host with access to everything. Even with their opt-in sandbox mode, all agents share the same environment. There’s no hard boundary between them. Your personal assistant can see your work agent’s data.

The right mental model: think of your agent as a colleague you want to collaborate with, but design your security as if it’s a malicious actor. Those two things aren’t contradictory. That’s just good security engineering.

What’s Next

Dario Amodei talks about “a country of geniuses in a data center.” For that to become real, new infrastructure, orchestration layers, and runtimes need to be built, purpose-built for agents operating at scale.

Today, a team can connect NanoClaw to multiple Slack channels and have separate agents handling different workloads, each isolated, each with its own context and data. But we’re heading somewhere much bigger.

Every employee will have a personal AI assistant. Every team will manage a team of agents. High-performing teams will manage hundreds. To get there, we need:

Controlled context sharing. Isolation is the foundation, but agents that work together need to share information. The hard part is the middle ground: agent teams that share all context freely within the team, but share selectively across team boundaries. You need to be able to lock everything down, control what goes in and what goes out, and then deliberately open up what should be shared. That needs to be native to the runtime, not bolted on.

Agents creating persistent agents. Not ephemeral sub-agents that spin up for a task and disappear. An agent adding a new member to its team, the way you hire someone. The new agent gets its own identity, its own persistent environment, its own data. It shows up tomorrow and remembers what it did yesterday. It accumulates context and expertise over time. This requires new primitives for identity, lifecycle management, and permission inheritance that don’t exist yet.

Fine-grained permissions and policies. Not just what tools an agent can access, but what it can do with them. Read email but not send. Access one repo but not another. Spend up to a threshold but no more.

Human-in-the-loop approvals. For irreversible actions, humans need to be in the approval chain. Agents propose, humans approve, agents execute.

NanoClaw is the secure, customizable runtime and orchestration layer for agent teams. Docker Sandboxes is the enterprise-grade infrastructure underneath. As agents move from single-player tools to full team members operating at enterprise scale, the stack that runs them needs to enforce isolation by default, enable controlled collaboration, and give organizations the visibility and governance they need. That’s what we’re building.

NanoClaw is an open-source, secure runtime and orchestration layer for agent teams. Star it on GitHub.


Read the original article

Comments

  • By theptip 2026-03-1314:583 reply

    They may seem like small details, but I think a couple novel design decisions are going to prove to be widely adopted and revolutionary.

    The biggest one (as Karpathy notes) is having skills for how to write a (slack, discord, etc) integration, instead of shipping an implementation for each.

    Call it “Claude native development” if you will, but “fork and customize” instead of batteries-included platforms/frameworks is going to be a big shift when it percolates through the ecosystem.

    A bunch of things you need to figure out, eg how do you ship a spec for how to test and validate the thing, make it secure, etc.

    How long before OSs start evolving in this way? You can imagine Auto research-like sharing and promotion upstream of good fixes/approaches, but a more heterogenous ecosystem could be more resistant to attacks if each instance had a strong immune system.

    • By vova_hn2 2026-03-1316:502 reply

      > having skills for how to write a (slack, discord, etc) integration, instead of shipping an implementation for each

      I'm not sure what is the advantage. Each user will have to waste time and tokens for the same task, instead of doing it once and and shipping to everyone.

      • By verdverm 2026-03-1317:00

        Agreement, excellence in one domain does not confer it to others. If you've ever worked with researchers, you know for the most part they are not engineers. This is bad advice / prediction by people with hammers imo.

        OCI is a good choice of reuse, they aren't having the agent reimplement that. When there is an existing SDK, no sense in rebuilding that either. Code you don't use should be compiled away anyhow.

      • By altruios 2026-03-1317:291 reply

        Except it's not 'once' though.

        In order for it to be 'once': all hardware must have been, currently be, and always will be: interchangeable. As well as all OS's. That's simply not feasible.

        • By vova_hn2 2026-03-1317:58

          I don't see, how is it relevant in this case. We are talking about writing an integration with an HTTP API (probably) in a high level language (TS/JS, Python, etc). We have already abstracted hardware away.

    • By primer42 2026-03-1315:262 reply

      I get the appeal but I disagree

      The strength of open source software is collaboration. That many people have tried it, read it, submitted fixes and had those fixes reviewed and accepted.

      We've all seen LLMs spit out garbage bugs on the first few tries. I've written garbage bugs on my first try too. We all benefit from the review process.

      I would rather have a battle tested base to start customizing from than having to stumble through the pitfalls of a buggy or insecure AI implementation.

      • By eli 2026-03-1315:45

        Troubleshooting "works on my machine" issues most be fun when no two people have exactly the same implementation.

        Also seems like this will further entrench the top 2 or 3 models. Use something else and your software stack looks different.

      • By theptip 2026-03-1317:21

        > We've all seen LLMs spit out garbage bugs on the first few tries.

        I’m assuming here an extrapolation of capabilities where Claude is competitive to the median OSS contributor for the off-the-shelf libraries you’d be comparing with.

        As with most of the Clawd ecosystem, for now it probably is best considered an art project / prototype (or a security dumpster fire for the non-technical users adopting it).

        > The strength of open source software is collaboration. That many people have tried it, read it, submitted fixes and had those fixes reviewed and accepted

        I do think that there is room for much more granular micro-libraries that can be composed, rather than having to pull in a monolithic dependency for your need. Agents can probably vet a 1k microlibrary BoM in a way a human could never have the patience to.

        (This is more the NPM way, leftpad etc, which is again a security issue in the current paradigm, but potentially very different ROI in the agent ecosystem.)

    • By tmaly 2026-03-1315:431 reply

      I have thought of this ship a spec concept. What is we are just trading markdown files instead of code files to implement some feature into our system?

      • By TYPE_FASTER 2026-03-1319:53

        I wish I could find the GitHub repo, but yes, I have seen at least one library written in Markdown to be used with Claude. Not a Claude skill, but functionality to be delivered.

  • By jryio 2026-03-1314:156 reply

    You must explicitly state what your threat model is when writing about security tooling, isolation, and sandboxing.

    This threat model is concerned with running arbitrary code generated by or fetched by an AI agent on host machines which contain secrets, sensitive files, and/or exfoliate data, apps, and systems which should not be lost.

    What about the threat model where an agent deletes your entire inbox? Or sends your calendar events to a server after prompt injection? Bank transfers of the wrong amount to the wrong address etc. all these are allowed under the sandboxing model.

    We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

    Sandboxes do not solve permission escalation or exfiltration threats.

    • By ryanrasti 2026-03-1316:44

      > We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

      Yes 100%, this is the critical layer that no one is talking about.

      And I'd go even further: we need the ability to dynamically attenuate tool scope (ocap) and trace data as it flows between tools (IFC). Be able to express something like: can't send email data to people not on the original thread.

    • By 0cf8612b2e1e 2026-03-1314:481 reply

      You mean like the section which goes into the threat model?

        The Security Model: Design for Distrust
      
        I wrote about this in Don’t Trust AI Agents: when you’re building with AI agents, they should be treated as untrusted and potentially malicious. Prompt injection, model misbehavior, things nobody’s thought of yet. The right approach is architecture that assumes agents will misbehave and contains the damage when they do…

      • By croes 2026-03-1315:491 reply

        Don‘t you see the contradiction?

        I don’t trust the agent so I sandbox it before I gave it the access data to my mail and bank accounts

    • By CuriouslyC 2026-03-1314:30

      I built an agent framework designed from the ground up around policy control (https://github.com/sibyllinesoft/smith-core) and I'm in the process of extracting the gateway from it so people can provide that same policy gated security to whatever agent they want (https://github.com/sibyllinesoft/smith-gateway).

      My posts about these aspects of agent security get zero engagement (not even a salty "vibe slop" comment, lol), so ironically security is the thing everyone's talking about, but most people don't know enough to understand what they need.

    • By chaosprint 2026-03-1317:33

      That's a great question, and it reminds me of something I read today:

      https://entropytown.com/articles/2026-03-12-openclaw-sandbox...

      The core issue, to me, is that permissions are inherently binary — can it send an email or not — while LLMs are inherently probabilistic. Those two things are fundamentally in tension.

    • By verdverm 2026-03-1317:02

      > We need fine grained permissions per-task or per-tool in addition to sandboxing. For example: "this request should only ever read my gmail and never write, delete, or move emails".

      We already have: IAM, WIF, Macaroons, Service Accounts

      Ask you resident SecOps and DevOps teams what your company already has available

    • By webpolis 2026-03-1317:55

      [dead]

  • By causal 2026-03-1314:132 reply

    I like NanoClaw a lot. I found OpenClaw to be a bloated mess, NanoClaw implementation is so much tighter.

    It's also the first project I've used where Claude Code is the setup and configuration interface. It works really well, and it's fun to add new features on a whim.

    • By LaurensBER 2026-03-1316:24

      Amen, my OpenClaw instance broke last week.

      Some update broke the OpenRouter integration and I haven't been able to fix the issue. I took a quick look at the code, hoping to narrow it down and it's pretty much exactly what you would expect, there's hidden configuration files everywhere and in general it's just a lot of code for what's effectively a for loop with Whatsapp integration (in my case :)).

      Not to mention that their security model doesn't match my deployment (rootless and locked down Kubernetes container) so every Openclaw update seemed to introduce some "fix" for a security issue that broke something else to solve a problem I do not have in the first place :)

      I've switched to https://github.com/nullclaw/nullclaw instead. Mostly because Zig seems very interesting so if I have to debug any issues with Nullclaw at least I'll be learning something new :)

    • By systemerror 2026-03-1314:481 reply

      what workflows do you implement in Nanoclaw that wouldn't be straightforward to build in Claude?

      • By causal 2026-03-1315:091 reply

        Straightforward is ambiguous. To replicate NanoClaw would probably only take about a day of work and testing and refining in Claude Code, but that's a day I didn't have to spend to get NanoClaw.

        • By pigeons 2026-03-1316:23

          yes but then what do you use nanoclaw for, that's its a better fit for than claude code.

HackerNews