lihanc111

2025-06-29 4:09

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

Please send to contact@lmcache.ai

2025-06-29 4:07

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

It is almost true for both. Although for the second case you can just skip storing in these cases where there is little improvement.

2025-06-29 4:04

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

It is in IBM's llm-d open source stack

2025-06-24 4:18

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

Our team has built this open source project, LMCache, to reduce repetitive computation in LLM inference and make systems serve more people (3x more throughput in chat applications) and it has been used in IBM's open source LLM inference stack.

In LLM serving, the input is computed into intermediate states called KV cache to further provide answers. These data are relatively large (~1-2GB for long context) and are often evicted when GPU memory is not enough. In these cases, when users ask a follow up question, the software needs to recompute for the same KV Cache. LMCache is designed to combat that by efficiently offloading and loading these KV cache to and from DRAM and disk.

Ask us anything!

2025-06-24 4:18

Submitted: "Lossless LLM 3x Throughput Increase by LMCache"

150 points50 commentsgithub.com

Redis for LLMs. Contribute to LMCache/LMCache development by creating an account on GitHub.

Hacker News

lihanc111

75

2025-06-18

Recent Activity

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

Commented: "Lossless LLM 3x Throughput Increase by LMCache"

Submitted: "Lossless LLM 3x Throughput Increase by LMCache"

HackerNews