Thanks for the feedback! On the chat interface point, we actually think chat is still a great way to get into the product, but we leverage the canvas behind the scenes to let the agents do better work and give users the ability to audit and visualize what's happening. Good note on the demo topic though, we have some broader, non-AI demos coming soon on our website and newsletter.
Spot on. The persistence layer is a huge part of what makes the canvas work.
For failures, we handle it at multiple levels: first, standard retries and fallbacks to alternate models/providers. If that fails, the agents look for alternate approaches to accomplish the same task (e.g. falling back to web search instead of browser use).
For completeness, you can also manually re-run or edit individual blocks if they fail (though the agents may or may not consider this depending on where they are in their flow).
Great question. The core of Spine is coordinating multiple specialized agents across multiple models, using the canvas to store and pass context selectively so each agent works with exactly what it needs.
On the eval side, we ran Spine Swarm against GAIA Level 3 and Google DeepMind's DeepSearchQA and hit #1 on both.Full writeup: https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-...