XCSme

2026-03-14 11:43

Commented: "Show HN: Channel Surfer – Watch YouTube like it’s cable TV"

Plus the ads on cable TV, I don't think there is any way to remove them

2026-03-12 10:40

Commented: "Shall I implement it? No"

In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.

This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.

Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.

2026-03-12 10:29

Commented: "Shall I implement it? No"

To be honest, I had this "issue" too.

I upgraded to a new model (gpt-4o-mini to grok-4.1-fast), suddenly all my workflows were broken. I was like "this new model is shit!", then I looked into my prompts and realized the model was actually better at following instructions, and my instructions were wrong/contradictory.

After I fixed my prompts it did exactly what I asked for.

Maybe models should have another tuneable parameters, on how well it should respect the user prompt. This reminds me of imagegen models, where you can choose the config/guidance scale/diffusion strength.

2026-03-12 9:42

Commented: "Shall I implement it? No"

Claude is quite bad at following instructions compared to other SOTA models.

As in, you tell it "only answer with a number", then it proceeds to tell you "13, I chose that number because..."

2026-03-12 8:02

Commented: "Grok 4.20 brings minimal improvements over Grok-4.1-fast"

It is also 10x more expensive, but also 4-5x faster.

The multi-agent version is 100x more expensive, 4-5x faster, but for basic tasks (without tool calling) gives a lot of wrong answers.

Hacker News

XCSme

2540

2014-12-28

About Me

Recent Activity

Commented: "Show HN: Channel Surfer – Watch YouTube like it’s cable TV"

Commented: "Shall I implement it? No"

Commented: "Shall I implement it? No"

Commented: "Shall I implement it? No"

Commented: "Grok 4.20 brings minimal improvements over Grok-4.1-fast"

HackerNews