p12tic

2025-06-03 8:36

Commented: "Why DeepSeek is cheap at scale but expensive to run locally"

We both agree. Batch size 1 is only relevant to people who want to run models on their own private machines. Which is the case of OP.

2025-06-01 10:00

Commented: "Why DeepSeek is cheap at scale but expensive to run locally"

All of this is for batch size 1.

2025-06-01 8:51

Commented: "Why DeepSeek is cheap at scale but expensive to run locally"

State of the art of local models is even further.

For example, look into https://github.com/kvcache-ai/ktransformers, which achieve >11 tokens/s on a relatively old two socket Xeon servers + retail RTX 4090 GPU. Even more interesting is prefill speed at more than 250 tokens/s. This is very useful in use cases like coding, where large prompts are common.

The above is achievable today. In the mean time Intel guys are working on something even more impressive. In https://github.com/sgl-project/sglang/pull/5150 they claim that they achieve >15 tokens/s generation and >350 tokens/s prefill. They don't share what exact hardware they run this on, but from various bits and pieces over various PRs I reverse-engineered that they use 2x Xeon 6980P with MRDIMM 8800 RAM, without GPU. Total cost of such setup will be around $10k once cheap Engineering samples hit eBay.

2025-05-28 12:53

Commented: "Negotiating PoE+ Power in the Pre‑Boot Environment"

Incorrect. https://en.wikipedia.org/wiki/USB_hardware#USB_Power_Deliver... is a good start about the subject: "PD-aware devices implement a flexible power management scheme by interfacing with the power source through a bidirectional data channel and requesting a certain level of electrical power <...>".

2025-04-05 7:18

Commented: "The Llama 4 herd"

For all intents and purposes cache may not exist when the working set is 17B or 109B parameters. So it's still better that less parameters are activated for each token. 17B parameters works ~6x faster than 109B parameters just because less data needs to be loaded from RAM.

Hacker News

p12tic

380

2020-06-23

Recent Activity

Commented: "Why DeepSeek is cheap at scale but expensive to run locally"

Commented: "Why DeepSeek is cheap at scale but expensive to run locally"

Commented: "Why DeepSeek is cheap at scale but expensive to run locally"

Commented: "Negotiating PoE+ Power in the Pre‑Boot Environment"

Commented: "The Llama 4 herd"

HackerNews