...

cirrusfan

8

Karma

2026-02-01

Created

Recent Activity

  • This model is exactly what you’d want for your resources. GPU for prompt processing, ram for model weights and context length, and it being MoE makes it fairly zippy. Q4 is decent; Q5-6 is even better, assuming you can spare the resources. Going past q6 goes into heavily diminishing resources.

  • Anthropic might have the best product for coding but good god the experience is awful. Random limits where you _know_ you shouldn’t hit them yet, the jankiness of their client, the service being down semi-frequently. Feels like the whole infra is built on a house of cards and badly struggles 70% of the time.

    I think my $20 openai sub gets me more tokens than claude’s $100. I can’t wait until google or openai overtake them.

  • I think it depends on what you use it for. Coding, where time is money? You probably want the Good Shit, but also want decent open weights models to keep prices sane rather than sama’s 20k/month nonsense. Something like a basic sentiment analysis? You can get good results out of a 30b MoE that runs at good pace on a midrange laptop. Researching things online with many sources and decent results I’d expect to be doable locally by the end of 2026 if you have 128GB ram, although it’ll take a while to resolve.

  • If it sounds too good to be true…

  • I find it really surprising that you’re fine with low end models for coding - I went through a lot of open-weights models, local and "local", and I consistently found the results underwhelming. The glm-4.7 was the smallest model I found to be somewhat reliable, but that’s a sizable 350b and stretches the definition of local-as-in-at-home.

HackerNews