I really like The Royal Game of Ur for some reason.
It is not just startups or small companies embracing agentic engineering… Stripe published blog posts about their autonomous coding agents. Amazon is blowing up production because they gave their agents access to prod. Google and Microsoft develop their own agentic engineering tools. It’s not just tech companies either, massive companies are frequently announcing their partnerships with OpenAI or Anthropic.
You can’t just pretend it’s startups doing all the agentic engineering. They’re just the ones pushing the boundaries on best practices the most aggressively.
That is why a fully automated firm would be a paradigm shift. Instead of requiring someone to be responsible and to QA things, you just let AI systems be responsible internally, and the company responsible as a whole for legal concerns.
This idea of an automated firm relies on the premise that AI will become more capable and reliable than people.
You laid out the theoretical limitations well, and I tend to agree with them.
I just get frustrated when people downplay how big of an impact filling in the gaps at the frontier of knowledge would have. 99.9% of researchers will never have an idea that adds a new spike to the knowledge frontier (rather than filling in holes), and 99.99% of research is just filling in gaps by combining existing ideas (numbers made up). In this realm, autoresearch may not be groundbreaking, but it can do the job. AlphaEvolve is similar.
If LLMs can actually get closer to something like that, it leaves human researchers a whole lot more time to focus on new ideas that could move entire fields forward. And their iteration speed can be a lot faster if AI agents can help with the implementation and testing of them.
Fundamentally, I’m more optimistic on how far current approaches can scale. I see no reason why RL could not be used to train models to use memory, and fine-tuning already works, it’s just expensive.
The continual learning we get may be a bit hamfisted, and not fit into a neat architecture, but I think we could actually see it work at scale in the next few years. Whereas new techniques like what Yann Lecun have demonstrated still live heavily in the realm of research. Cool, but not useful yet.
Fine tuning is also not so limited as you suggest. For one, we don’t need to fine tune the same model over and over, you can just start with a frontier model each time. And two, modern models are much better at generating synthetic data or environments for RL. This could definitely work, but it might require a lot of work in data collection and curation, and the ROI is not clear. But if large companies continue to allocate more and more resources to AI in the next few years, I could see this happening.
OpenAI already has a custom model service, and labs have stated they already have custom models built for the military (although how custom those models are is unclear). It doesn’t seem like a huge leap to also fine-tune models over a companies internal codebases and tooling. Especially for large companies like Google, Amazon, or Stripe that employ tens of thousands of software engineers.
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io