Probably LLM-generated, but that's a fair point :D Well, the proxy is open source, maybe someone will even implement this before we do :)
Talking about the features proxy unlocks - we have already added some monitoring, such as a dashboard of the currently running sessions and the "prompt bank" storing the previous user's interactions
Claude code still has /compact taking ages - and it is a relatively easy fix. Doing proactive compression the right way is much tougher. For now, they seem to bet on subagents solving that, which is essentially summarization with Haiku. We don't think it is the way to go, because summarization is lossy + additional generation steps add latency
I think we should draw distinction between two compression "stages"
1. Tool output compression: vanilla claude code doesn't do it at all and just dumps the entire tool outputs, bloating the context. We add <0.5s in compression latency, but then you gain some time on the target model prefill, as shorter context speeds it up.
2. /compact once the context window is full - the one which is painfully slow for claude code. We do it instantly - the trick is to run /compact when the context window is 80% full and then fetch this precompaction (our context gateway handles that)
Please try it out and let us know your feedback, thanks a lot!
Subagents do summarization - usually with the cheaper models like Haiku. Summarizing tool outputs doesn't work well because of the information loss: https://arxiv.org/pdf/2508.21433. Compression is different because we keep preserved pieces of context unchanged + we condition compression on the tool call intent, which makes it more precise.