Show HN: Omnara – Run Claude Code from anywhere

2025-08-1216:33310168github.com

Omnara (YC S25) - Talk to Your AI Agents from Anywhere! - omnara-ai/omnara

You can’t perform that action at this time.


Read the original article

Comments

  • By henriquegodoy 2025-08-1217:035 reply

    This is pretty cool and feels like we're heading in the right direction, the whole idea of being able to hop between devices while claude code is thinking through problems is neat, but honestly what excites me more is the broader pattern here, like we're moving toward a world where coding isn't really about sitting down and grinding out syntax for hours, it's becoming more about organizing tasks and letting ai agents figure out the implementation details.

    I can already see how this evolves into something where you're basically managing a team of specialized agents rather than doing the actual coding, you set up some high-level goals, maybe break them down into chunks, and then different agents pick up different pieces and coordinate with each other, the human becomes more like a project manager making decisions when the agents get stuck or need direction, imho tools like omnara are just the first step toward that, right now it's one agent that needs your input occasionally, but eventually it'll probably be orchestrating multiple agents working in parallel, way better than sitting there watching progress bars for 10 minutes.

    • By kmansm27 2025-08-1217:106 reply

      Exactly! My ideal vision for the future is that agents will be doing all grunt work/implementation, and we'll just be guiding them.

      Can't wait til I'm coding on the beach (by managing a team of agents that notify me when they need me), but it might take a few more model releases before we get there lol

      • By zmmmmm 2025-08-130:122 reply

        If you think you could do that on the beach, couldn't you do traditional software dev on the beach?

        I actually think there's a chance it will shift away from that because it will shift the emphasis to fast feedback loops which means you are spending more of your time interacting with stakeholders, gathering feedback etc. Manual coding is more the sort of task you can do for hours on end without interruption ("at the beach").

        • By wiseowise 2025-08-139:13

          > which means you are spending more of your time interacting with stakeholders, gathering feedback etc.

          Jesus Christ, I really need to speed up development of my product. If this shifts to more meetings at wageslave, I’m going to kill myself.

        • By szundi 2025-08-134:08

          How nice when just hung up with a demanding stakeholder who knows you can deliver a lot “instantly” you switch to your phone and your “agents” are just stuck into some weird stuff that they cannot debug.

          That must be a nice situ on the beach.

      • By jdironman 2025-08-131:10

        What happens is the status quo changes. Like what happened with Dev/Ops. If you find yourself with the time to lead agents on a beach retreat you might find yourself pulled into more product design / management meetings instead. AI/Dev like DevOps. Wearing more hats as a result. Maybe I'm wrong though.

      • By roozbeh18 2025-08-1219:111 reply

        someone at the leadership is also thinking how he/she can lower head count by removing the agent master

      • By js4ever 2025-08-1220:21

        I did exactly that all this summer at the beach with Claude code. Future is already here!

      • By IncreasePosts 2025-08-1217:351 reply

        What will you have to offer when coding is so easy at that point?

        • By kmansm27 2025-08-1217:44

          I still think that human taste is important even if agents become really good at implementing everything and everyone's just an idea guy. Counter argument: if agents do become really good at implementation, then I'm not sure if even human taste would matter if agents could brute force every possibility and launch it into the market.

          Maybe I'll just call it a day and chill with the fam

      • By theappsecguy 2025-08-1221:402 reply

        Seems like your vision is to let AI take over your livelihood. That’s an unusually chipper way to hand over the keys unless you have a lifetime of wealth stashed away.

        • By zaptheimpaler 2025-08-132:25

          There is enormous money and effort in making AI that can do that, so if it's possible it is eventually going to happen. The only question is whether you're part of the group making the replacement or the group being replaced.

        • By filoleg 2025-08-1223:121 reply

          It depends on what their livelihood is.

          If their livelihood is solving difficult problems, and writing code is just the implementation detail the gotta deal with, then this isn’t gonna do much to threaten their livelihood. Like, I am not aware of any serious SWE (who actually designs complex systems and implements them) being genuinely worried about their livelihood after trying out AI agents. If anything, that makes them feel more excited about their work.

          But if someone’s just purely codemonkeying trivial stuff for their livelihood, then yeah, they should feel threatened. I have a feeling that this isn’t what the grandparent comment user does for a living tho.

          • By theappsecguy 2025-08-1923:161 reply

            Unfortunately C -suite’s don’t quite see eye to eye to your logical breakdown here from my experience.

            • By filoleg 2025-08-2420:39

              I neither know nor care what the C-suite at my company thinks, as long as they provide me the resources necessary to get my job done effectively.

              And, so far, it seems like they are fairly understanding, as they are happy about the output of my work. After all, they aren't paying me per-line-of-code delivered, they are paying me to solve problems. If they think that an LLM can replace me fully, they are more than welcome to try it and see how it works out for them.

              The entirety of my report chain is just former engineers (with some of them being pivotal to things like GMaps SDK for iOS and such), so I am not really worried about them testing this theory out in practice. And if they do and decide that an LLM can replace me, well, there are always other jobs out there I can take. From my personal experience at this company, I will be just fine.

    • By manumasson 2025-08-139:062 reply

      > it's becoming more about organizing tasks and letting ai agents figure out the implementation details ... different agents pick up different pieces and coordinate with each other

      This is exactly what I have been working on for the past year and a half. A system for managing agents where you get to work at a higher abstraction level, explaining (literally with your voice) the concepts & providing feedback. All the agent-agent-human communication is on a shared markdown tree.

      I haven't posted it anywhere yet, but your comment just describes the vision too well, I guess it's time to start sharing it :D see https://voicetree.io for a demo video. I have been using it everyday for engineering work, and it really is feeling like how you describe; my job is now more about organizing tasks, explaining them well, and providing critique, but just through talking to the computer. For example, when going through the git diffs of what the agents wrote, I will be speaking out loud any problems I notice, resulting in voice -> text -> markdown tree updates and these will send hook notifications to claude code so they automatically address feedback.

      • By andhuman 2025-08-1313:391 reply

        Cool demo! The first thing that sprung to mind after seeing it, was an image of a busy office floor filled with people talking into their headsets, not selling or buying stocks, but actually programming. If it’s a blessed or cursed image I’ll let you decide.

        • By manumasson 2025-08-1315:20

          Haha, blursed one might say. In seriousness though, the social avoidance of wanting to talk to a computer around others will likely be the largest bottleneck to adoption for this sort of tech. May need to initially frame it as for work from home engineers.

          Luckily the other side to this project doesn't require any user behavioural changes. The idea is to convert chat histories into a tree format with the same core algorithm, and then send only the relevant sub-tree to the LLM, reducing input tokens and context bloat, thereby also improving accuracy. This would then also unlock almost infinite length LLM chats. I have been running this LLM context retrieval algo against a few benchmarks, GSM-infinite, nolima, and longbench-v2 benchmarks, the early results are very promising, ~60-90% reduced tokens and increased accuracy against SOTA, however only on a subset of the full benchmark datasets.

      • By lucasverra 2025-08-1316:53

        completed the form

    • By kbouck 2025-08-134:531 reply

      > moving toward a world where coding isn't really about sitting down and grinding out syntax

      Love the idea of "coding" while walking/running outside. For me those outside activities help me clear my mind and think about tough problems or higher level stuff. The thought of directing agents to help persist and refine fleeting thoughts/ideas/insights, flesh out design/code, etc is intriguing

      • By sampullman 2025-08-136:231 reply

        I do a bit of that now, I'll mostly use Claude code at home, and set Jules on some tasks from my phone while exercising. Reviewing code is tedious though, and I don't see it getting too much better.

        • By calgoo 2025-08-1310:15

          On the code review part, that's also because we are using languages designed for humans. Once we design the programming languages for the LLM, then you design it in such a way that code review by humans and AI is easy.

          Same with project org, if you organize the project for LLM efficiency instead of human efficiency then you simplify some parts that the llm have issues with.

    • By jama211 2025-08-1220:01

      Yeah exactly, this is awesome, I’ve always wondered while waiting for AI operations to complete why I’m “tied” to my machine and can’t just shut my laptop while it worked and see what it’d done later. This is so cool

    • By Dayshine 2025-08-1217:081 reply

      But why should it take time at all? Newer developer tooling (especially some of the rust tools e.g. UV) are lightning fast.

      Wouldn't it be better if you asked for it and rather than having to manage workers it was just... Done

      • By jama211 2025-08-1220:00

        Yes it would be good if we lived in a world where ai magically knew exactly what we wanted even before we did and implemented everything perfectly first time in a way we’d have no issues with or tweaks we’d like it to make ever. I agree.

  • By mccoyb 2025-08-1216:569 reply

    One big question I have, in the era of Claude Code (and advancements yet to come) — is why should a hacker submit to using tools behind a SaaS offering … when one can just roll their own tools? I may be mistaken, but I don’t think there is any sort of moat here.

    Truly — this is an excellent and accessible idea (bravo!), but if I can whittle away at a free and open source version, why should I ever consider paying for this?

    • By myflash13 2025-08-131:382 reply

      This is exactly what I thought when picking a customer support software last month. After hiring my first support person and being unable to decide between Intercom/Front/HelpScout/Zendesk I finally just vibe coded my own helpdesk in a few days with the just the features I needed - perfectly integrated into my SaaS, and best of all, free.

      • By bravesoul2 2025-08-1311:351 reply

        Doesn't the vibe coded solution just mean you need to spend time maintaining that code that isn't your core business? Unless a bespoke customer support is crucial to it?

        • By myflash13 2025-08-1311:491 reply

          Yes, but the cost of building and maintaining code has gone down so fast that it might actually be worth it. Plus, we get bespoke features that we would never get otherwise. And you have to spend developer time maintaining a good integration with an external product anyway.

          • By forsakenharmony 2025-08-1411:132 reply

            The cost of maintaining code has absolutely not gone down and every line of code is tech debt

            The big issue with AI coding is that is kills the fun part of software development (actually writing code) and just becomes reviewing and understanding code you didn't write

            • By amrangaye 2025-08-2510:57

              “Kills the fun part of coding” - absolutely not! Coding is a lot more fun now that I can move from idea to working prototype in an evening without having to figure out individual libraries, research and learn them etc. The last time I felt this excited / into tech was when I first discovered Ruby on Rails and started using it for projects.

              I’ve done several projects that would take months to complete otherwise with “vibes coding”, including: an African fairy tale generator for my daughter, a farm management system for the ministry of agriculture in my country, a Gambian political comic strip creator, a system that generates ten minutes summary podcasts of all my country’s news etc. I’ve also had great success with clients - and got them to sign on much faster - by just putting together a quick demo now that I show them instead of sending a proposal and pitch deck describing what I’ll build for them. It makes them so much more excited and we can make changes almost in realtime.

              I’ve noticed a lot in the industry and even on hn, that coders - especially long time ones - tend to “look down” on vibes coding, the same way they did with scripted languages back in the day, and I imagine the same way with compilers. I think this will generally fade out as it becomes industry standard, but in the meantime sometimes I see comments on hn that are so discouraging and cynical it makes me wonder if the person actually tried it out or had just pre judged it. I also think the phrase “vibe coding” is a terrible name, cause it makes it sound like a lazy way of doing things. It’s so much more than that, and lets you think and plan at the idea level. Things like planning your system before you ask it to implement also help a lot.

            • By ramraj07 2025-08-1412:33

              Keep your manual Porsche for weekend drives. Get a adaptive cruise lane assist camry for real work.

              Every line of code is tech debt, true. Every integration is orders of magnitude more tech debt. The only time an integration wasn't tech debt was when I set up new relic logging.

      • By cpursley 2025-08-1310:561 reply

        I’d love to hear more about how this works. It’s whatever you built, integrated with your email stack? Because I’m super sassed out.

        • By myflash13 2025-08-1311:481 reply

          Yes it's just a wrapper on top of Gmail with Inbox Zero philosophy (each email is a support ticket). I only needed 3 features for my helpdesk:

          1. an AI email drafter which used my product docs and email templates as context (eventually I plan to add "tools" where the AI can lookup info in our database)

          2. a simple email client with a sidebar with customer contextual info (their billing plan, etc.) and a few simple buttons for the operator to take actions on their account

          3. A few basic team collaboration features, notes, assigning tickets to operators, escalating tickets...

          It took about 2 days to build the initial version, and about 2 weeks to iron out a number of annoying AI slop bugs in the beginning. But after a month of use it's now pretty stable, my customer support hire is using it and she's happy.

    • By zackify 2025-08-1216:582 reply

      Yeah exactly.

      I’ve been using Tailscale ssh to a raspberry pi.

      With Termix on iOS.

      I can do all the same stuff on my own. Termix is awesome (I’m not affiliated)

      • By myflash13 2025-08-131:332 reply

        Also see solutions which don’t require a central server like Vibetunnel.

        • By cygn 2025-08-1620:14

          what's the advantage of vibetunnel? And which central server is required? Vibe tunnel still sounds like "ssh into your machine from your phone", or is there something I'm missing?

        • By itsalotoffun 2025-08-136:18

          +1 for vibetunnel

      • By smithclay 2025-08-1219:032 reply

        similar: blink + tailscale + zellij + devcontainers

    • By ericb 2025-08-1312:221 reply

      Not the op, but I think about that. Here's what I came to, for the moment:

      * LLM's are lousy at bugs

      * Apps are a bit like making a baby. Fun in the moment, but a lifetime support commitment

      * Supporting software isn't fun, even with an LLM. Burnout is common in open source.

      * At the end of the day, it is still a lot of work, even guiding an LLM

      * Anything hosted is a chore. Uptime, monitoring, patching, backing up, upgrading, security, legal, compliance, vulnerabilities

      I think we'll see github littered with buggy, unsupported, vibe coded one-offs for every conceivable purpose. Now, though, you literally have no idea what you're looking at or if it is decent.

      Claude made four different message passing implementations in my vibe coded app. I realized this once it was trying to modify the wrong one during a fix. In other words, claude was falling over trying to support what it made, and only a dev could bail it out. I am perfectly capable of coding this myself, but you have two choices at the moment--invest the labor, or get crap. But, then we come to "maybe I should just pay for this instead of burning my time and tokens."

      • By mccoyb 2025-08-1313:33

        In regards to the duplication of code — yes I’ve found this to be a tremendous problem.

        One technique which appears to combat this is to do “red team / blue team Claude”

        Red team Claude is hypercritical, and tries to find weaknesses in the code. Blue team Claude is your partner, who you collaborate with to setup PRs.

        While this has definitely been helpful for me finding “issues” that blue team Claude will lie to you about — hallucinations are still a bit of an issue. I mostly put red team Claude into ultrathink + task mode to improve the veracity of its critiques.

    • By sailfast 2025-08-1219:541 reply

      Because then you don’t have to whittle away, and you’re free to blame someone else if anything goes wrong.

      Maybe that is more for a general engineer than a Hacker though - hacker to me implies some sort of joy in doing it yourself rather than optimizing.

      • By mccoyb 2025-08-130:141 reply

        I like to be able to tweak things to my liking, and this typically leads me to make my own versions of things.

        Probably a bad habit.

        • By sailfast 2025-08-202:41

          Hard disagree. It’s a great habit. Keep going! You’re doing it right.

    • By kmansm27 2025-08-1217:001 reply

      Thanks! I think the main reason to pay right now would be for convenience. A user wouldn't have to worry about hosting their own frontend/backend and building their own mobile app. And eventually, we want to have different agent providers host their agents for use on our platform, but that's further out.

      • By svieira 2025-08-1217:59

        Correct - but if this is such a game changer in development speed and the market is already validated that this kind of platform is useful then step 1 is build enough of a clone of the platform to start iterating with it and then ... TO THE MOON! It's entirely a having-the-best-vision moat, which is a moat, but one that's principally protected by trademark lawsuits.

    • By stpedgwdgfhgdd 2025-08-1315:41

      And relying on (another) 3rd party provider that indirectly has access to your code….

      I do not how it is implemented, but if I can press ‘continue’ from my phone, someone else could enter other commands… Like export database…

    • By mccoyb 2025-08-1217:011 reply

      The answer here might be: "you're not our market" (which is totally fine! but slightly confusing, because presumably people _using agents like Claude Code_ are ... more advanced than the uninitiated)

      • By kmansm27 2025-08-1217:132 reply

        Yeah, I would say that most Claude Code users are pretty technical, but I was surprised to see that there's a decent number of non-technical users using Claude Code to completely vibe code applications for their personal use. Those users seem to love tools like Codex (the openai cloud UI one, not the CLI) and things like Omnara, where there's no setup

        • By jmvldz 2025-08-2022:32

          Ah interesting. I wonder if it's similar with Devin users.

        • By mccoyb 2025-08-1217:18

          Makes sense! Thanks for discussing.

    • By jama211 2025-08-1220:02

      I mean, you could say this about almost literally any software product ever to be honest. Feel free I guess? People like to pay for convenience and support so they don’t have to build everything themselves.

      • By mccoyb 2025-08-1219:231 reply

        This doesn't contribute to the conversation ... without further elaboration on what your point is, I'm assuming that you're pointing out that my question is analogous to previous (good to ask!) questions about market and user model for an "eventually very big" application.

        Not very enlightening: just because Dropbox became big in one environment, doesn't mean the same questions aren't important in new spaces.

        • By arendtio 2025-08-1219:331 reply

          Well, this is a classic here at HN.

          So every time someone comes around with a sentence like 'but if I can whittle away at a free and open source version, why should I ever consider paying for this?', the answer will be that Dropbox thread ;-)

          • By herval 2025-08-1223:59

            Following on this offtopic - I wonder if there was ever another case of the Dropbox thread effect on HN? I don’t recall any other cases…

  • By tqwhite 2025-08-1221:452 reply

    When you let Claude run free over changes big enough to have this thing be meaningful, are you really getting good enough code?

    When I just set Claude loose for long periods, I get incomprehensible, brittle code.

    I don't do maintenance so maybe that's the difference but I have not had good results from big, unsupervised changes.

    • By adastra22 2025-08-135:021 reply

      Subagents has changed this for me. I routinely run half hour to hour long tasks with no human intervention, and actually get good results at the end (most of the time, not all of the time).

      The reason isn’t that AI models have gotten better, although they clearly have, but that using subagents (1) keeps context clear of false starts and errors that otherwise poison the AI’s view of the project, and (2) by throwing in directives to run subagents that keep the main agent aligned (e.g. code review agents), it gets nudged back on course a surprisingly high percentage of the time.

      • By MrGreenTea 2025-08-1310:261 reply

        Would you elaborate a bit on how you use subagents? I tend to use them sporadically, for example for it to research something or to analyse the code base a bit. But I'm not yet letting it run for long.

        • By adastra22 2025-08-1319:531 reply

          Sure. First of all, although I do spend a lot of time interacting with Claude Code in chat format, that is not what I am talking about here. I have setup Claude Code with very specific instructions for use of agents, which I'll get to in a second.

          First of all, there's a lot of collections of subagent definitions out there. I rolled my own, then later found others that worked better. I'm currently using this curated collection: https://github.com/VoltAgent/awesome-claude-code-subagents

          CLAUDE.md has instructions to list `.agents/agents/**/*.md` to find the available agents, and knows to check the frontmatter yaml for a one-line description of what each does. These agents are really just (1) role definitions that prompts the LLM to bias its thinking in a particular way ("You are a senior Rust engineer with deep expertise in ..." -- this actually works really well), and (2) a bunch of rules and guidelines for that role, e.g. in the Rust case to use thiserror and strum crates to avoid boilerplate in Error enums, rules for how to satisfy the linter, etc. Basic project guidelines as they relate to Rust dev.

          Secondly, my CLAUDE.md for the project has very specific instructions about how the top-level agent should operate, with callouts to specific procedure files to follow. These live in `.agent/action/**/*.md`. For example, I have a git-commit.md protocol definition file, and instructions in CLAUDE.md that "when the user prompts with 'commit' or 'git commit', load git-commit action and follow the directions contained within precisely." Within git-commit.md, there is a clear workflow specification in text or pseudocode. The [text] is my in-line comments to you and not in the original file:

          """ You are tasked with committing the currently staged changes to the currently active branch of this git repository. You are not authorized to make any changes beyond what has already been staged for commit. You are to follow these procedures exactly.

          1. Check that the output of `git diff --staged` is not empty. If it is empty, report to the user that there are no currently staged changes and await further instructions from the user.

          2. Stash any unstaged changes, so that the worktree only contains the changes that are to be committed.

          3. Run `./check.sh` [a bash script that runs the full CI test suite locally] and verify that no warnings or errors are generated with just the currently staged changes applied.

          - If the check script doesn't pass, summarize the errors and ask the user if they wish to launch the rust-engineer agent to fix these issues. Then follow the directions given by the user.

          4. Run `git diff --staged | cat` and summarize the changes in a git commit message written in the style of the Linux kernel mailing list [I find this to be much better than Claude's default commit message summaries].

          5. Display the output of `git diff --staged --stat` and your suggested git commit message to the user and await feedback. For each response by the user, address any concerns brought up and then generate a new commit message, as needed or instructed, and explicitly ask again for further feedback or confirmation to continue.

          6. Only when the user has explicitly given permission to proceed with the commit, without any accompanying actionable feedback, should you proceed to making the commit. Execute 'git commit` with the exact text for the commit message that the user approved.

          7. Unstash the non-staged changes that were previously stashed in step 2.

          8. Report completion to the user.

          You are not authorized to deviate from these instructions in any way. """

          This one doesn't employ subagents very much, and it is implicitly interactive, but it is smaller and easier to explain. It is, essentially, a call center script for the main agent to follow. In my experience, it does a very good job of following these instructions. This particular one addresses a pet peeve of mine: I hate the auto-commit anti-feature of basically all coding assistants. I'm old-school and want a nice, cleanly curated git history with comprehensible commits that take some refining to get right. It's not just OCD -- my workflow involves being able to git bisect effectively to find bugs, which requires a good git history.

          ...continued in part 2

          • By adastra22 2025-08-1319:531 reply

            ...

            I also have a task.md workflow that I'm actively iterating on, and is the one that I get it working autonomously for a half hour to an hour and am often surprised at finding very good results (but sometimes very terrible results) at the end of it. I'm not going to release this one because, frankly, I'm starting to realize there might be a product around this and I may move on that (although this is already a crowded space). But I don't mind outlining in broad strokes how it works (hand-summarized, very briefly):

            """ You are a senior software engineer in a leadership role, directing junior engineers and research specialists (your subagents) to perform the task specified by the user.

            1. If PLAN.md exists, read its contents and skip to step 4.

            2. Without making any tool calls, consider the task as given and extrapolate the underlying intent of the user. [A bunch of rules and conditions related to this first part -- clarify the intent of the user without polluting the context window too much]

            3. Call the software-architect agent with the reformulated user prompt, and with clear instructions to investigate how the request would be implemented on the current code base. The agent is to fill its context window with the portions of the codebase and developer documentation in this repo relevant to its task. It should then generate and report a plan of action. [Elided steps involving iterating on that plan of action with the user, and various subagents to call out to in order to make sure the plan is appropriately sequenced in terms of dependent parts, chunked into small development steps, etc. The plan of action is saved in PLAN.md in the root of the repository.]

            4. While there are unfinished todos in the PLAN.md document, repeat the following steps:

            a) Call rust-engineer to implement the next todo and/or verify completion of the todo.

            b) Call each of the following agents with instructions to focus on the current changes in the workspace. If any actionable items are found in the generated report that are within the scope of the requested task, call rust-engineer to address these items and then repeat:

            - rust-nit-checker [checks for things I find Claude gets consistently wrong in Rust code]

            - test-completeness-checker [checks for missing edge cases or functionality not tested]

            - code-smell-checker [a variant of the software architect agent that reports when things are generally sus]

            - [... a handful of other custom agents; I'm constantly adjusting this list]

            - dirty-file-checker [reports any test files or other files accidentally left and visible to git]

            c) Repeat from step a until you run through the entire list of agents without any actionable, in-scope issues identified in any of the reports & rust-engineer still reports the task as fully implemented.

            d) Run git-commit-auto agent [A variation of the earlier git commit script that is non-interactive.]

            e) Mark the current todo as done in PLAN.md

            5. If there are any unfinished todo in PLAN.md, return to step 4. Otherwise call software-architect agent with the original task description as approved by the user, and request it to assess whether the task is complete, and if not to generate a new PLAN.md document.

            6. If a new PLAN.md document is generated, return to step 4. Otherwise, report completion to the user. """

            That's my current task workflow, albeit with a number of items and agent definitions elided. I have lots of ideas for expanding it further, but I'm basically taking an iterative and incremental approach: every time Claude fumbles the ball in an embarrassing way (which does happen!), I add or tweak a rule to avoid that outcome. There are a couple of key points:

            1) Using Rust is a superpower. With guidance to the agent about what crates to use, and with very strict linting tools and code checking subagents (e.g. no unsafe code blocks, no #[allow(...)] directives to override the linter, an entire subagent dedicated to finding and calling out string-based typing and error handling, etc.) this process produces good code that largely works and does what it was requested to do. You don't have to load the whole project in context to avoid pointer or use-after-free issues, and other things that cause vibe coded project to fail at a certain complexity. I don't see this working in a dynamic language, for example, even though LLMs are honestly not as good at Rust as they are in more prominent languages.

            2) The key part of the task workflow is the long list of analysts to run against the changes, and the assumption that works well in practice that you can just keep iterating and fixing reported issues (with some of the elided secret sauce having to do with subagents to evaluate whether an issue is in scope and needs to be fixed or can be safely ignored, and keeping on eye out for deviations from the requested task). This eventual completeness assumption does work pretty well.

            3) At some point the main agent's context window gets poisoned, or it reaches the full context window and compacts. Either way this kills any chance of simply continuing. In the first case (poisoning) it loses track of the task and ends up caught in some yak shaving rabbit hole. Usually it's obvious when you check in that this is going on, and I just nuke it and start over. In the latter case (full context window) the auto-compaction also pretty thoroughly destroys workflow but it usually results in the agent asking a variation on "I see you are in the middle of ... What do you want to do next?" before taking any bad action to the repo itself. Clearing the now poisoned context window with "/reset" and then providing just "task: continue" gets it back on track. I have a todo item to automate this, but the Claude Code API doesn't make it easy.

            4) You have to be very explicit about what can and cannot be done by the main agent. It is trained and fine-tuned to be an interactive, helpful assistant. You are using it to delegate autonomous tasks. That requires explicit and repeated instructions. This is made somewhat easier by the fact that subagents are not given access to the user -- they simply run and generate reports for the calling agent. So I try to pack as much as I can in the subagents and make the main agent's role very well defined and clear. It does mean that you have to manage out of band communication between agents (e.g. the PLAN.md document) to conserve context tokens.

            If you try this out, please let me know how it goes :)

            • By kami23 2025-08-146:571 reply

              I tried this tonight as my first time using anything like Claude code, and having a week or so of copilot agentic mode experience.

              It's the right path, I'm very smitten with seeing the sub agents working together. Blew through the Pro quota really fast.

              I was a skeptic and am no more. Gonna see what it takes to run something basic in a home lab, and how the performance is, even if it is incredibly slow on a beefy home system, just checking in on it should be low enough friction for it to noodle on some hobby projects.

              • By adastra22 2025-08-148:10

                Glad it worked for you :)

                Yeah it was a "HOLY SHIT" moment for me when I first started experimenting with subagents. A step-change improvement in productivity for sure. They combine well together with Claude Code's built-in todo tool, and together really start to deliver on the promised goal of automating development. Watching it delegate to subagents and then seeing the flow of information back and forth is amazing.

                One thing I forgot to mention -- I run Claude within a simple sandboxed dev container like this: https://github.com/maaku/agents/tree/main/.devcontainer This allows to safely run with '--dangerously-skip-permissions' which basically gives Claude free reign within the docker container in which it is running. This is what lets you run without user interaction.

                When you say "run something basic in a home lab" do you mean local inference? Qwen3-Coder is probably the model to use if you want to go that route. Avoid gpt-oss as they used synthetic data in their training and it is unlikely to perform well.

                I'm investigating this as well as I need local inference for some sensitive data. But honestly, the anthropic models work so well that I justified getting myself the unlimited/max plan and I mostly use that. I suspect I overbought -- at $200/mo I have yet to ever be rate limited, even with these long-running instances. I stay within the ToC and only run 1-2 sessions at a time though.

    • By ishsup 2025-08-1222:251 reply

      yeah that’s a fair experience, we’ve seen similar when leaving Claude unsupervised for too long. The way we use Omnara, it’s more about staying in the loop for those moments when Claude needs clarification or a quick decision, so you can keep it on track without babysitting the terminal the whole time

HackerNews