alyxya

2025-06-28 6:30

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

I skimmed over a couple of the papers referenced to get an idea of what optimizations LMCache is doing.

* KV cache compression - compressing the bytes of the KV cache, taking advantage of patterns in the KV cache and with dynamic levels of compression

* KV cache blending - concatenating the KV caches of multiple reused prompts with minimal KV cache recomputation for use cases like RAG, where it's more performant than the standard lossless KV cache prefix optimization, and gives better results than naively concatenating the KV caches for the reused prompts

These optimizations are pretty cool and different than the standard KV cache optimizations. The title saying lossless seems misleading though.

Hacker News

alyxya

6

2024-08-16

Recent Activity

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

HackerNews