
Documentation and guides from the team at Fly.io.
I’m Chris McCord, the creator of Elixir’s Phoenix framework. For the past several months, I’ve been working on a skunkworks project at Fly.io, and it’s time to show it off.
I wanted LLM agents to work just as well with Elixir as they do with Python and JavaScript. Last December, in order to figure out what that was going to take, I started a little weekend project to find out how difficult it would be to build a coding agent in Elixir.
A few weeks later, I had it spitting out working Phoenix applications and driving a full in-browser IDE. I knew this wasn’t going to stay a weekend project.
If you follow me on Twitter, you’ve probably seen me teasing this work as it picked up steam. We’re at a point where we’re pretty serious about this thing, and so it’s time to make a formal introduction.
World, meet Phoenix.new, a batteries-included fully-online coding agent tailored to Elixir and Phoenix. I think it’s going to be the fastest way to build collaborative, real-time applications.
Let’s see it in action:
First, even though it runs entirely in your browser, Phoenix.new gives both you and your agent a root shell, in an ephemeral virtual machine (a Fly Machine) that gives our agent loop free rein to install things and run programs — without any risk of messing up your local machine. You don’t think about any of this; you just open up the VSCode interface, push the shell button, and there you are, on the isolated machine you share with the Phoenix.new agent.
Second, it’s an agent system I built specifically for Phoenix. Phoenix is about real-time collaborative applications, and Phoenix.new knows what that means. To that end, Phoenix.new includes, in both its UI and its agent tools, a full browser. The Phoenix.new agent uses that browser “headlessly” to check its own front-end changes and interact with the app. Because it’s a full browser, instead of trying to iterate on screenshots, the agent sees real page content and JavaScript state – with or without a human present.
Agents build software the way you did when you first got started, the way you still do today when you prototype things. They don’t carefully design Docker container layers and they don’t really do release cycles. An agent wants to pop a shell and get its fingernails dirty.
A fully isolated virtual machine means Phoenix.new’s fingernails can get arbitrarily dirty. If it wants to add a package to mix.exs, it can do that and then run mix phx.server or mix test and check the output. Sure. Every agent can do that. But if it wants to add an APT package to the base operating system, it can do that too, and make sure it worked. It owns the whole environment.
This offloads a huge amount of tedious, repetitive work.
At his startup school talk last week, Andrej Karpathy related his experience of building a restaurant menu visualizer, which takes camera pictures of text menus and transforms all the menu items into pictures. The code, which he vibe-coded with an LLM agent, was the easy part; he had it working in an afternoon. But getting the app online took him a whole week.
With Phoenix.new, I’m taking dead aim at this problem. The apps we produce live in the cloud from the minute they launch. They have private, shareable URLs (we detect anything the agent generates with a bound port and give it a preview URL underneath phx.run, with integrated port-forwarding), they integrate with Github, and they inherit all the infrastructure guardrails of Fly.io: hardware virtualization, WireGuard, and isolated networks.
Github’s gh CLI is installed by default. So the agent knows how to clone any repo, or browse issues, and you can even authorize it for internal repositories to get it working with your team’s existing projects and dependencies.
Full control of the environment also closes the loop between the agent and deployment. When Phoenix.new boots an app, it watches the logs, and tests the application. When an action triggers an error, Phoenix.new notices and gets to work.
Phoenix.new can interact with web applications the way users do: with a real browser.
The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won’t just sketch the code out and make sure it compiles and lints. It’ll pull the app up itself and poke at the UI, simultaneously looking at the page content, Javascript state, and serverside logs.
Phoenix is all about “live” real-time interactivity, and gives us seamless live reload. The user interface for Phoenix.new itself includes a live preview of the app being worked on, so you can kick back and watch it build front-end features incrementally. Any other .phx.run tabs you have open also update as it goes. It’s wild.
Phoenix.new can already build real, full-stack applications with WebSockets, Phoenix’s Presence features, and real databases. I’m seeing it succeed at business and collaborative applications right now.
But there’s no fixed bound on the tasks you can reasonably ask it to accomplish. If you can do it with a shell and a browser, I want Phoenix.new to do it too. And it can do these tasks with or without you present.
For example: set a $DATABASE_URL and tell the agent about it. The agent knows enough to go explore it with psql, and it’ll propose apps based on the schemas it finds. It can model Ecto schemas off the database. And if MySQL is your thing, the agent will just apt install a MySQL client and go to town.
Frontier model LLMs have vast world knowledge. They generalize extremely well. On stage at ElixirConfEU, I did a demo vibe-coding Tetris on stage. Phoenix.new nailed it, first try, first prompt. It’s not like there’s gobs of Phoenix LiveView Tetris examples floating around the Internet! But lots of people have published Tetris code, and lots of people have written LiveView stuff, and 2025 LLMs can connect those dots.
At this point you might be wondering – can I just ask it to build a Rails app? Or an Expo React Native app? Or Svelte? Or Go?
Yes, you can.
Our system prompt is tuned for Phoenix today, but all languages you care about are already installed. We’re still figuring out where to take this, but adding new languages and frameworks definitely ranks highly in my plans.
We’re at a massive step-change in developer workflows.
Agents can do real work, today, with or without a human present. Buckle up: the future of development, at least in the common case, probably looks less like cracking open a shell and finding a file to edit, and more like popping into a CI environment with agents working away around the clock.
Local development isn’t going away. But there’s going to be a shift in where the majority of our iterations take place. I’m already using Phoenix.new to triage phoenix-core Github issues and pick problems to solve. I close my laptop, grab a cup of coffee, and wait for a PR to arrive — Phoenix.new knows how PRs work, too. We’re already here, and this space is just getting started.
This isn’t where I thought I’d end up when I started poking around. The Phoenix and LiveView journey was much the same. Something special was there and the projects took on a life of their own. I’m excited to share this work now, and see where it might take us. I can’t want to see what folks build.
I am a long time php dev who has been interested in learning elixir/phoenix for a while but never quite motivated enough.
I saw this and thought, if this doesn't get me to give it a go, nothing will.
Less than 45 minutes after signing up for fly.io, I have a multi-room tic tac toe game deployed.
https://tic-tac-toe-cyber.fly.dev/
I had it build the game, opting for a single room at first to see if that worked. Then I had it add multiple rooms on a different git branch in case that didn't work. It worked great.
I learned very little about elixir, phoenix, or deploying to fly.io up to this point, and I already have a nice looking app deployed and running.
I know a lot of devs will hate that this is possible, and it is up to me now to look at the steps it took to create this and really understand what is happening, which are broken down extremely simply for me...
I will do this because I want to learn. I bet a lot of people won't bother to do that. But those people never would have had apps in the first place and now they can. If they are creating fun experiences and not banking apps, I think that is still great.
You guys have been releasing amazing things for years only to be poorly replicated in other languages years later.. but you really outdid yourselves here.
I'm blown away.
edit: is there a way to see how much of my credits were used by building this?
This is amazing on multiple fronts! I reset your usage, so the next round is on us! We shipped credits the day before launch, so usage UI is still TBD, but should be out next week. Thanks for the sharing your experience!
Hi Christ, is there any way to get more credits or BYO api key for anthoripic/openai? Im trying to make Kahoot clone and already spend more that 40 in a couple hours.
Based on how much they seem to charge (I blew through the $20 initial in like an hour, equivalent use in Claude Code would have been around $3), they're clearly making a pretty big margin on top of the API calls. I doubt they're going to allow BYOK
Was the graphic design created from prompts too? It's surprisingly nice, especially considering you spent 45 minutes on it.
I told it that I wanted a two player tic tac toe game.
it give me a selection of "styles" and I chose neon retro.. I probably could have been more creative and typed in my own suggestion.
Other than that, I said absolutely nothing about how I wanted the layout.
It came up with the idea of listing all active games on the homepage, with the number of players in each, all on its own.
I went from "I want a two player tic tac toe game" to having one, and then added multiple rooms, and deployed it all in under 45 minutes, with little input other than that..
Did you figure out how much credit was used? I want to try this out, but $20 of credit can go quick doing agentic work
I'm not sure exactly but I think I used nearly all of it.
I've seen others say they went through the full $20 within 45minutes to an hour.
They are supposed to be adding a way to monitor usage soon.
Phoenix creator here. I'm happy to answer any questions about this! Also worth noting that phoenix.new is a global Elixir cluster that spans the planet. If you sign up in Australia, you get an IDE and agent placed in Sydney.
Amazing work.
Just a clarifying question since I'm confused by the branding use of "Phoenix.new" (since I associate "Phoenix" as a web framework for Elixir apps but this seems to be a lot more than that).
- Is "Phoenix.new" an IDE?
- Is "Phoenix.new" ... AI to help you create an app using the Phoenix web framework for Elixir?
- Does "Phoenix.new" require the app to be hosted/deployed on Fly.io? If that's the case, maybe a naming like "phoenix.flyio.new" would be better and extensible for any type of service Fly.io helps in deployment - Phoenix/Elixir being one)
- Is it all 3 above?
And how does this compare to Tidewave.ai (created as presumably you know, by Elixir creator)
Apologies if I'm possibility conflating topics here.
Yes all 3. It has been weird trying to position/brand this as we started out just going for full-stack Elixir/Phoenix and it became very clear this is already much bigger than a single stack. That said, we wanted to nail a single stack super well to start and the agent is tailored for vibe'd apps atm. I want to introduce a pair mode next for more leveled assistance without having to nag it.
You could absolutely treat phoenix.new as your full dev IDE environment, but I think about it less an IDE, and more a remote runtime where agents get work done that you pop into as needed. Or another way to think about it, the agent doesn't care or need the vscode IDE or xterm. They are purely conveniences for us meaty humans.
For me, something like this is the future of programming. Agents fiddling away and we pop in to see what's going on or work on things they aren't well suited for.
Tidewave is focused on improving your local dev experience while we sit on the infra/remote agent/codex/devin/jules side of the fence. Tidewave also has a MCP server which Phoenix.new could integrate with that runs inside your app itself.
> For me, something like this is the future of programming. Agents fiddling away and we pop in to see what's going on or work on things they aren't well suited for.
Honestly, this is depressing. Pop in from what? Our factory jobs?
I understand that we are slowly taking away our own jobs but I do not find it depressing. I do find it concerning since most people do not talk about this openly. We are not sure how we are restructure so many jobs. If we cannot find jobs, what is the financial future for a large number of people across the world. This needs more thinking, honest acceptance of the situation. It will happen, we should take a positive approach to finding a new future.
Read up on the Jevons Paradox
> In economics, the Jevons paradox (/ˈdʒɛvənz/; sometimes Jevons effect) occurs when technological advancements make a resource more efficient to use (thereby reducing the amount needed for a single application); however, as the cost of using the resource drops, if the price is highly elastic, this results in overall demand increasing, causing total resource consumption to rise. Governments have typically expected efficiency gains to lower resource consumption, rather than anticipating possible increases due to the Jevons paradox.[1]
I do think there will be some Jevons effect going on with this, but I think it's important to recognize that software development as a resource is different than something like coal. For example, if the average iPhone-only teenager can now suddenly start cranking out apps, that may ultimately increase demand for apps and there may be more code than ever getting "written," but there won't necesarily be a need for your CS-grad software engineer anymore, so we could still be fucked. Why would you pay a high salary for a SWE when your business teams can just generate whatever app they need without having to know anything about how it actually works?
I think the arguments about "AI isn't good enough to replace senior engineers" will hold true for a few years, but not much beyond that. Jevon's Paradox will probably hold true for software as a resource, but not for SWEs as a resource. In the coal scenario, imagine that coal gets super cheap to procure because we invent robots that can do it from alpha to omega. Coal demand may go up, but the job for the coal miner is toast, and unless that coal miner has ownership stake, they will be out on their ass.
The coal miner would have to pivot to being someone who knows a lot about coal instead of someone that actually obtained it, they’d become more of a coal-advisor to the person making decisions about what type of or how much coal to get/what’s even possible with the coal they’re getting.
The future I’m seeing with AI is one where software (i.e. as a way to get hardware to do stuff) is basically a non-issue. The example I wanna work on soon is telling Siri I want my iPhone to work as a touchpad for my computer and have the necessary drivers for that to happen be built automatically because that’s a reasonable thing I could expect my hardware to do. That’s the sort of thing that seems pretty achievable by AI in a couple turns that would take a single dev a year or two. And the thing is, I can’t imagine a software dev that doesn’t have some set of skills that are still applicable in this future, either through general CS skills (knowing what’s within reasonable expectations of hardware, being able to effectively describe more specific behavior/choosing the right abstractions etc) or other more nebulous technical knowledge (e.g. what you want to do with hardware in the first place).
Another thing I will mention is that for things like the iPhone example from earlier, there are usually a lot of optimizations or decisions involved that are derived from the user’s experience as a human which the LLM can’t really use synthetically. As another example if I turned my phone into a second monitor the LLM might generate code that sends full resolution images to the phone when the phone’s screen is much lower, there’s no real point for it to optimize that away if it doesn’t know how eyes work and what screens are used for. So at some point it needs to involve a model of a human, at least for examples like these.
> The coal miner would have to pivot to being someone who knows a lot about coal instead of someone that actually obtained it, they’d become more of a coal-advisor to the person making decisions about what type of or how much coal to get/what’s even possible with the coal they’re getting.
I definitely agree that there will be some jobs/roles like that, and it won't be 100% destruction of SWEs (and many other gigs that will be affected), but I can't imagine that more than a small percentage of consultants will be needed. The top 10% of engineers I think will be just fine for the reasons you've said, but at the lower levels it will be a blood bath (and realistically maybe it should as there are plenty of SWEs that probably shouldn't be writing code that matters, but that feels like a separate discussion). Your point about other skills/knowledge is good too, though I suspect most white collar jobs are on the chopping block too, just maybe shortly behind.
Your future is one that I'm dreaming about too (although I have a hard time believing Apple would allow you to do that, but on Android or some future 3rd option it might be possible). Especially as a Linux user there have been plenty of times I've thought of cool stuff that I'd love to have personally that would take me months of work to build (time I've accepted I'll never have until my kids are all out of the house at least haha). I'm also dreaming of a day when I can just ask the AI to produce more seasons of Star Trek TOS, Have Gun - Will Travel, The Lieutenant, and many other great shows that I'm hungry for more, and have it crank them out. That future would be incredible!
But that feels like the smooth side of the sword, and avoiding a deep cut from the sharp side feels increasingly important. Hopefully it will solve itself but seeing the impacts so far I'm getting worried.
I appreciate the discussion and optimism! There is too much AI doomerism out there and the upsides (like you've mentioned) don't get talked about enough I think.
Computers are not special. They are just a heat engine like everything else. We feed them concentrated energy that they dissipate to do work. They do work on data: we give it data (some of it is called code) and it gives us back data. It's all about the information content, how does that data communicate something and relate to the world?
"Training" is just upfront work. Why on Earth people expect to get from the machine that processes data some novel information that did not exist before?
This whole fantasy hinges on not understand the sheer amount of data these LLMs are being trained on, and some magical thinking about it producing some novel information ex nihilo somehow. I will never understand how intelligent people fall into this patterns of thought.
We can only get from computers what we put into them.
> Why would you pay a high salary for a SWE when your business teams can just generate whatever app they need without having to know anything about how it actually works?
It depends on how good the AI is. The advantage of an SWE is that they have a systems thinking mindset, so they can solve some problems more efficiently. With some apps in won't matter, but with others will.
One potential positive outcome is that we will be able to solve more and bigger problems, since our capacity for solving problems has been augmented with AI.
> Pop in from what? Our factory jobs?
Oh, you sweet summer child. ;)
You will pop in from the other 9 projects you are currently popping in on, of course! While running 10 agents at once!
And from which exactly am I earning an income to feed myself? Who's buying what I'm making? Where are they getting their money?
We're building a serfdom again.
LOL, what? Take on 10 projects at once, and start making way more money... if you're not an external-locus-of-control moron at least
You've literally been given an excavator when you currently have a shovel, and you're worried that other excavators will dig you out of a job. That is a literal analogy to your POV, here
Hopefully, from sitting by the pool drinking margaritas ... but I doubt we will get to keep our new found freedom.
Never going to happen. More efficiency and automation won’t lead to more free time and money for the masses, it will lead to fewer people employed, and those that are will be working the same hours for the same money but outputting more. Only the rich people will benefit.
In the long term. In the short term, we get to do the same work but faster.
Indeed, why would an employer pay us a high salary to sit by the pool? The benefits will go to the founders/investors and the customers. They'll benefit greatly from the increased output and lower costs, but the middlemen (SWEs) will be cut out. That's a great thing if you're a founder/investor or a customer, but not if you're the middleman. New opportunities may come around, but I don't think that's inevitable. It remains to be seen.
It will not be easier for founders/investor either. If couple prompts is all it takes to build your product, your potential customers will write those prompts themselves instead of buying your product.
Hot damn, that's a great point! Although I fully expect the models at some point to say stuff like, "I'm sorry I can't generate a <whatever> because that would violate Apple's/Google's/Whatever IP" and then have them enforce it with the power of government (copyright/patent/regulation/etc). There's also lots of industries where compliance requirements create a moat that might be difficult to get past, though that's probably just a short/medium-term problem.
True. But someone at the top will benefit. Either it’s the companies that can produce more of something that the end user can’t easily replicate themselves for whatever reason, or at least the LLM providers.
What I mean is, it will create value. Just not for the masses. And maybe not for the small businesses. If anything, it will let the big corporations do even more: a few big players doing everything and no little players at all.
Some people prefer to pay for others to handle things and take responsibility.
How about our software engineering jobs, which will now entail managing a team of agents?
The Phoenix.new environment includes a headless Chrome browser that our agent knows how to drive. Prompt it to add a front-end feature to your application, and it won’t just sketch the code out and make sure it compiles and lints. It’ll pull the app up itself and poke at the UI, simultaneously looking at the page content, JavaScript state, and server-side logs.
Is it possible to get that headless Chrome browser + agent working locally? With something like Cursor?
Playwright has an MCP server which I believe should be able to give you this.
When Roo Code uses Claude, it does this while developing. It renders in the sidebar and you can watch it navigate around. Incredibly slow, but that’s only a matter of time.
Does it work with VSCode GitHub Copilot LLM provider? They have Claude in there
I know it's early days, but here's a must-have wish list for me:
- ability to run locally somehow. I have my own IDE, tools etc. Browser IDEs are definitely not something I use willingly.
- ability to get all code, and deploy it myself, anywhere
---
Edit: forgot to add. I like that every video in Elixir/Phoenix space is the spiritual successor to "15-minute rails blog" from 20 year ago. No marketing bullshit, just people actually using the stuff they build.
You can push and pull code to and from local desktop already: hamburger menu => copy git clone/copy git push.
You could also have it use GitHub and do PRs for a codex/devin style workflows. Running phoenix.new itself locally isn't something we're planning, but opening the runtime for SSH access is high on our list. Then you could do remote ssh access with local vscode or whatever.
> Running phoenix.new itself locally isn't something we're planning
So no plans to open the source code?
Everyone has to eat.
For sure. I'm just hesitant to recommend sending one's codebase to a server running code I can't inspect. I suppose that's the status quo with LLM's these days, though.
confirm
"15-minute rails blog" changed the game so I definitely resonate with this. My videos are pretty raw, so happy to hear it works for some folks.
run locally or in your private cloud would be amazing. The latter bit would be a great paid option for large enterprises
Include optional default email, auth, analytics, job management (you know… the one everyone uses ::cough:: Oban ::cough::), dev/staging/prod modes (with “deployment” or something akin to CD… I know it’s already in the cloud, but you know what I mean) and some kind of non-ephemeral disk storage, maybe even domain management… and this will slay. Base44 just got bought for $80M for supplying all those, but nothing is as cool as Elixir of course!
These other details that are not “just coding” are always the biggest actual impediments to “showing your work”. Thanks for making this!! Somehow I am only just discovering it (toddler kid robbing my “learning tech by osmosis” time… a phenomenon I believe you are also currently familiar with, lol)
Hi just to confirm as I cannot find anything related to security or your use of using submitted code for training purposes. Where is your security policies with regards to that.
We don't do any model training, and only use existing open source or hosted models. Code gets sent to those providers in context windows. They all promise not to train on it, so far.
You said it terribly to be honest
Ask some security questions, I'll get you security answers. We're not a model company; we don't "train" anything.
Is there a transparent way to see credit used/remaining/topped up, and do you have any tips for how you can prompt the agent that might offer more effective use of credits?
The LLM chat taps out but I can't find a remaining balance on the fly.io dashboard to gauge how I'm using it. I _can_ see a total value of purchased top ups, but I'm not clear how much credit was included in the subscription.
It's very addictive (because it is awesome!) but I've topped up a couple of times now on a small project. The amount of work I can get out the agent per top-up does seem to be diminishing quite quickly, presumably as the context size increases.
Is there something comparable that works similarly but completely offline with appropriate hardware? Not everywhere has internet or trusts remote execution and data storage.
PS: Why can't I get IEx to have working command-line history and editing? ;-P
Any takeaways on using Fly APIs for provisioning isolated environments? I'm looking into doing something similar to Phoenix.new but for a low-code server-less workflow system.
1 week of work to go from local-only to fly provisioned IDE machines with all the proxying. fly-replay is the unsung hero in this case, that's how we can route the *.phx.run urls to your running dev servers, how we proxy `git push` to phoenix.new to your IDE's git server, and how we frame your app preview within the IDE in a way that works with Safari (cross origin websocket iframes are a no go). We're also doing a bunch of other neat tricks involving object storage, which we'll write about at some point. Feel free to reach out in slack/email if you want to chat more.
1. What's your approach to accessibility? Do you test accessibility of the phoenix.new UI? Considering that many people effectively use Phoenix to write front-ends, have you conducted any evals on how accessible those frontends come out?
2. How do you handle 3rd party libraries? Can the agent access library docs somehow? Considering that Elixir is less popular than more mainstream languages, and hence has less training data available, this seems like an important problem to solve.
It seems like they're giving you lower level building building blocks here. It's up to the developer to address these things. Instruct the agent to build/test for accessibility, feed it docs via MCP or by other means.
They use the Daisy UI component library in 1.8+ Phoenix versions which should have basic accessibility baked in.
Watched the Tetris demo of this and it was very impressive. I was particularly surprised how well it seems to work with the brand-new scopes, despite the obvious lack of much prior art. How did you get around this, how much work was the prompt, and are you comfortable sharing it?
What is the benefit of this vs. just running your agent of choice in any ole container?
The whole post is about that. Not everything is for everybody, so if it doesn't resonate for you, that's totally OK.
Oh geez so sorry for the dumb question! I read a lot about the benefits of containerization in general for agents, but thought it might be enlightening/instructive to know what this specific project adds to that (other than the special Elixir-tuned prompting).
But either way I hear you, thanks so much for taking the time to set me straight. It seems like either way you have done some visionary things here and you should be content with your good work! This stuff does not work for me for just circumstantial reasons (too poor), but still always very curious about the stuff coming out!
Again, so sorry. Congrats on the release and hope your day is good.
Gotcha! I'll keep reading it I guess until I see what I am missing! Good job again!
I did none of the work! I'm just like Flavor Flav or Bez in this situation. I will relay your congrats to Chris and the team, though. ;)
Bad analogy. Bez was the best singer and most important member of that group.
Huh ok! Well you sure are quite passionate. Thanks either way I guess.
This looks amazing! I keep loving Phoenix more the more I use it.
I was curious what the pricing for this is? Is it normal fly pricing for an instance, and is there any AI cost or environment cost?
And can it do multiple projects on different domains?
It’s $20 per month if you click through, and I haven’t tried it but almost certainly the normal hosting costs will be added on top.
I've tried it, the $20 of included credits lasted me about 45 minutes
Thanks, apparently didn't click through enough
Just tried it out, but it's unclear what the different buttons at the bottom of the chat history does. The rightmost one (cloud with an upwards arrow) seems to do the same as the first?
I'm also having trouble with getting it to read PDFs from URLs. I got this error:
web https://example.com/file.pdf Error: page.goto: net::ERR_ABORTED at https://example.com/file.pdf Call log: - navigating to "https://example.com/file.odf", waiting until "load" at main (/usr/local/lib/web2md/web2md.js:313:18) { name: 'Error' }
/workspace#
Do you have a package for calling LLM services we can use? This service is neat, but I don't need another LLM IDE built in Elixir but I COULD really use a way to call LLMs from Elixir.
Req.post to /chat/completions, streaming the tokens through a parser and doing regular elixir messages. It's really not more complicated than that :)
even less complicated, just set stream: false in your json :)
Thanks for everything you do Chris! Keep crushing it.
How tightly coupled to Fly.io are generated apps?
Everything starts as a stock phx.new app which use sqlite by default. Nothing is specific to fly. You should be able to copy the git clone url, paste, cd && mix deps.get && mix phx.server locally and the app will just work.
If you're willing to share, is maintaining that modularization the plan going forward? I'm pretty happy to use and pay for this and deploy it to fly, but only as long as I'm not "locked in."
Does it mean I can build and deploy a SQLite based app on fly.io with this approach without using Postgres? If yes, how does the pricing for the permanent storage ( add) needed for SQLite works? Thanks
You would need to add a fly volume ($0.15/GB per month of provisioned capacity ), also check out https://fly.io/blog/litestream-revamped/
What LLM(s) is the agent using? Are you fine-tuning the model? Is the agent/model development a proprietary effort?
Currently claude 4 sonnet as the main driver, with a combination of smaller models for certain scenarios
I'm assuming you're using FLAME?
How do you protect the host Elixir app from the agent shell, runtime, etc
Not using FLAME in this case. The agent runs entirely separately from your apps/IDE/compute. It communicates with and drives your runtime over phoenix channels
Oh interesting. So how do messages come from the container? Is there a host elixir app that is running the agent env? How does that work?
Yes, elixir app deployed across the planet as a single elixir cluster. We spawn the agents (GenServer's), globally register them, and then the end-user LiveView chat communicates with the agent with regular elixir messages, and the IDE is a phoenix channels client that communicates with and is driven by the agent.
how are they isolating ai agent state from app-level processes without breaking BEAM's supervision guarantees?
They run on separate machines and your agent just controls the remote runtime when it needs to interact with the system/write/read/etc
appreciate the clarity, that helps.
quick followup if the agent's running on a separate machine and interacting remotely, how are failure modes handled across the boundary? like if the agent crashes mid-operation or sends a malformed command, does the remote runtime treat it as an external actor or is there a strategy linking both ends for fault recovery or rollback? just trying to understand where the fault tolerance guarantees begin and end across that split.
token auth and re-handshake. Agent is respawned if it's no longer alive, and project index is resynced
[dead]
the ai agent runs inside the same remote runtime as the app. does it share the BEAM vm or run as a port process?
The agent runs outside your IDE instance and controls/communicates with it over Phoenix channels
This is very cool. I think the primary innovation here is twofold:
1. Remote agent - it's a containerized environment where the agent can run loose and do whatever - it doesn't need approval for user tasks because it's in an isolated environment (though it could still accidentally do destructive actions like edit git history). I think this alone is a separate service that needs to be productionized. When I run claude code in my terminal, automatically spin up the agent in an isolated environment (locally or remotely) and have it go wild. Easy to run things in parallel
2. Deep integration with fly. Everyone will be trying to embed AI deep into their product. Instead of having to talk to chatgpt and copy paste output, I should be able to directly interact with whatever product I'm using and interact with my data in the product using tools. In this case, it's deploying my web app
look into Kasm workspaces.. great way to spin up remote docker-based linux desktops, and works great as an AI dev environment that you can use wherever you happen to be. There is homedir persistence, and package persistence can be achieved via some extra configuration that allows for Brew homedir-based package persistence.
https://hub.docker.com/r/linuxserver/kasm
https://www.reddit.com/r/kasmweb/comments/1l7k2o8/workaround...
I have recently been working with Google Jules and it has a similar approach. It spins up VMs and goes through tasks given.
It does not handle any infrastructure, so no hosting. It allows me to set multiple small tasks, come back and check, confirm and move forward to see a new branch on GitHub. I open a PR, do my checks (locally if I need to) and merge.
>> Remote agent - it's a containerized environment where the agent can run loose and do whatever
How is this innovation?
Many people have not experienced the async agent workflow yet and fairly the major providers didn’t have offerings for them until a month or two ago.
It’s in fact one of my predictors for if they are going to be enthusiastic about agents or not.
And you wouldn’t think containerization would be a big leap but this stuff is so new and moving so fast that combining them with existing tech can surprise people.
It's less innovative and more trendy. A lot of the fly integration can be achieved by simply asking claude code to look up the docs for the fly cli tool.