Hi, I am ankit, Founder of Clio AI - We train custom Gen AI models for enterprises.
Here is the link: https://www.clioapp.ai
Email: ankit at clioapp.ai
Twitter: ankit2119
From what it looks like, it's one main LLM (you are sending query to - orchestrator) which calls other LLMs via tool calls. The tools are capable of calling llms too, and can have specific instructions, but mostly just the orchestrator deciding what they should be researching on, and assigns them specific subqueries. There is a limited depth / levels of search queries too, you should see the prompt they use[1]
One cool example of this in action is seen when you use claude code and ask it to search something. In a verbose setting, it calls an MCP tool to help with search. The tool returns summary of the results with the relevant links (not the raw search result text). A similar method, albeit more robust, is used when Claude is doing deep research as well.
[1]: https://github.com/anthropics/anthropic-cookbook/blob/main/p...
I see this and immediately relived the last two years of the journey. I think some of the mental model that helped me might help the community too.
What people expect from finetuning is knowledge addition. You want to keep the styling[1] of the original model, just add new knowledge points that would help your task. In context learning is one example of how this works well. Just that even here, if the context is out of distribution, a model does not "understand" it and would produce guesswork.
When it comes to LoRA or PEFT or adapters, it's about style transfer. And if you focus on a specific style of content, you will see the gains, just that the model wont learn new knowledge that wasnt already in original training data. It will forget previously learnt styles depending on context. When you do full finetuning (or SFT with no frozen parameters), it will alter all the parameters, and results in gain of new knowledge at the cost of previous knowledge (and would give you some gibberish if you ask about topics outside of domain). This is called catastrophic forgetting. Hence, yes, full finetuning works - just that it is an imperfect solution like all the others. Recently, with Reinforcement learning, there have been talks of continual learning, where Richard sutton's latest paper also lands at, but thats at research level.
Having said all that, if you start with the wrong mental model for Finetuning, you would be disappointed with the results.
The problem to solve is about adding new knowledge, while preserving the original pretrained intelligence. Still in wip, but we published a paper last year on one way it could be done. Here is the link: https://arxiv.org/abs/2409.17171 (it also has results for experiments all different approaches).
[1]: Styling here means the style learned by the model in SFT. Eg: Bullets, lists, bolding out different headings etc. all of that makes the content readable. The understanding of how to present the answer to a specific question.
The paper is sloppy. The original point may have credence[1], but the way they went about showing it is borderline irresponsible. The first aspect is how they conflated the number of steps involved with difficulty level. Not even considering the solution space. Then, the solutions are long, models are trained to keep the answers concise, and they are measuring consistency across tries. (Eg: Tower of hanoi for 13 steps needs 80k tokens to just blurt out the answer. The model already knows there is literally one way to solve it - ergo search space is not that big - but the paper shows that it is not reasoning. (ofc it isnt, since the sonnet64k would run out of tokens even without reasoning). Then, you have the scenario when even a 0.999 accurate llm would mess up one token and goes wrong on one run. They cited that as an example of how LLMs get it wrong and conclude its memorization and pattern matching and not reasoning. Real world data and usage does not correspond to that.
[1]: Anthropic found that reasoning is not 100% accurate. Thats the premise of the paper, just the headline is super clickbaity.
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io