abhgh

2026-03-04 3:52

Commented: "Qwen3.5 Fine-Tuning Guide"

They are great for specialized use-cases: (a) where the problem is not hard enough (you don't need reasoning), or (b) diverse enough (you don't need a world model), (c) you want cheap inference (and you can make it happen hardware-wise) and (d) you either have enough data or a workflow that accumulates data (with fine tuning with enough data you can sometimes beat a premier model while ensuring low latency - ofc, assuming (a) and (b) apply).

I make it sound like a rare perfect storm needs to exist to justify fine tuning, but these circumstances are not uncommon - to an extent (a), (c) and (d) were already prerequisites for deploying traditional ML systems.

2026-02-09 12:22

Commented: "Ask HN: What are you working on? (February 2026)"

I notice you mentioned dspy - do you also support prompt optimization?

2026-02-04 7:13

Commented: "I miss thinking hard"

This is an amazing quote - thank you. This is also my argument for why I can't use LLMs for writing (proofreading is OK) - what I write is not produced as a side-effect of thinking through a problem, writing is how I think through a problem.

2026-02-03 5:32

Submitted: "The Gumbel-Max Trick"

2 points0 commentsblog.quipu-strands.com

When we set out to learn a function or some property of it (like its maximum), we hope it is differentiable, because that means we have at our disposal a host of well-studied, and often fast,…

2026-01-18 7:37

Commented: "Ask HN: When has a "dumb" solution beaten a sophisticated one for you?"

I once modeled user journeys on a website using fancy ML models that honored sequence information, i.e., order of page visits, only to be beaten by bag-of-words (i.e., page url becomes a vector dimension, but order is lost) decision tree model, which was supposed to be my baseline.

What I had overlooked was that journeys on that particular website were fairly constrained by design, i.e., if you landed on the home page, did a bunch of stuff, put product X in the cart - there was pretty much one sequence of pages (or in the worst case, a small handful) that you'd traverse for the journey. Which means the bag-of-words (BoW) representation was more or less as expressive as the sequence model; certain pages showing up in the BoW vector corresponded to a single sequence (mostly). But the DT could learn faster with less data.

Hacker News

abhgh

845

2012-07-06

About Me

Recent Activity

Commented: "Qwen3.5 Fine-Tuning Guide"

Commented: "Ask HN: What are you working on? (February 2026)"

Commented: "I miss thinking hard"

Submitted: "The Gumbel-Max Trick"

Commented: "Ask HN: When has a "dumb" solution beaten a sophisticated one for you?"

HackerNews