Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

Comments

By lreeves 2025-10-3115:132 reply

I sometimes still code with a local LLM but can't imagine doing it on a laptop. I have a server that has GPUs and runs llama.cpp behind llama-swap (letting me switch between models quickly). The best local coding setup I've been able to do so far is using Aider with gpt-oss-120b.

I guess you could get a Ryzen AI Max+ with 128GB RAM to try and do that locally but non-nVidia hardware is incredibly slow for coding usage since the prompts become very large and take exponentially longer but gpt-oss is a sparse model so maybe it won't be that bad.

Also just to point it out, if you use OpenRouter with things like Aider or roocode or whatever you can also flag your account to only use providers with a zero-data retention policy if you are truly concerned about anyone training on your source code. GPT5 and Claude are infinitely better, faster and cheaper than anything I can do locally and I have a monster setup.

By fm2606 2025-10-3117:385 reply

gpt-oss-120b is amazing. I created a RAG agent to hold most of GCP documentation (separate download, parsing, chunking, etc). ChatGPT finished a 50 question quiz in 6 min with a score of 46 / 50. gpt-oss-120b took over an hour but got 47 / 50. All the other local LLMs I tried were small and performed way worse, like less than 50% correct.

I ran this on an i7 with 64gb of RAM and an old nvidia card with 8g of vram.

EDIT: Forgot to say what the RAG system was doing which was answering a 50 question multiple choice test about GCP and cloud engineering.

By embedding-shape 2025-10-3118:13

> gpt-oss-120b is amazing

Yup, I agree, easily best local model you can run today on local hardware, especially when reasoning_effort is set to "high", but "medium" does very well too.

I think people missed out on how great it was because a bunch of the runners botched their implementations at launch, and it wasn't until 2-3 weeks after launch that you could properly evaluate it, and once I could run the evaluations myself on my own tasks, it really became evident how much better it is.

If you haven't tried it yet, or you tried it very early after the release, do yourself a favor and try it again with updated runners.

By whatreason 2025-11-012:02

What do you use to run gpt-oss here? ollama, vLLM, etc

By rovr138 2025-10-3123:47

> I created a RAG agent to hold most of GCP documentation (separate download, parsing, chunking, etc)

If you share the scripts to gather the GCP documentation this, that'd be great. Because I have had an idea to do something like this, and the part I don't want to deal with is getting the data

By lacoolj 2025-10-3118:544 reply

you can run the 120b model on an 8GB GPU? or are you running this on CPU with the 64GB RAM?

I'm about to try this out lol

The 20b model is not great, so I'm hoping 120b is the golden ticket.

By ThatPlayer 2025-10-3122:36

With MoE models like gpt-oss, you can run some layers on the CPU (and some on GPU): https://github.com/ggml-org/llama.cpp/discussions/15396

Mentions 120b is runnable on 8GB VRAM too: "Note that even with just 8GB of VRAM, we can adjust the CPU layers so that we can run the large 120B model too"

By gunalx 2025-10-3122:07

I have in many cases had better results with the 20b model, over the 120b model. Mostly because it is faster and I can iterate prompts quicker to choerce it to follow instructions.

By fm2606 2025-10-3119:08

Everything I run, even the small models, some amount goes to the GPU and the rest to RAM.

By fm2606 2025-10-3119:06

Hmmm...now that you say that, it might have been the 20b model.

And like a dumbass I accidentally deleted the directory and didn't have a back up or under version control.

Either way, I do know for a fact that the gpt-oss-XXb model beat chatgpt by 1 answer and it was 46/50 at 6 minutes and 47/50 at 1+ hour. I remember because I was blown away that I could get that type of result running locally and I had texted a friend about it.

I was really impressed but disappointed at the huge disparity between time the two.

By neilv 2025-10-3119:22

https://github.com/mostlygeek/llama-swap

By egberts1 2025-10-3120:301 reply

Ollama, 16-CPU Xenon E6320 (old), 1.9Ghz, 120GB DDRAM4, 240TB RAID5 SSDs, on Dell Precision T710 ("The Beast"). NO GPU. 20b (n oooooot f aah st at all). Pure CPU bound. Tweaked for 256KB chunking into RAG.

Ingested election laws of 50 states, territories and Federal.

Goal. Mapping out each feature of the election and deal with (in)consistent terminologies sprouted by different university-trained public administration. This is the crux of hallunications: getting a diagram of ballot handling and their terminologies.

Then maybe tackle the multitude ways of election irregularities, or at least point out integrity gaps at various locales.

https://figshare.com/articles/presentation/Election_Frauds_v...

By banku_brougham 2025-10-3122:40

bravo, this is a great use of talent for society

By gcr 2025-10-3120:114 reply

For new folks, you can get a local code agent running on your Mac like this:

1. $ npm install -g @openai/codex

2. $ brew install ollama; ollama serve

3. $ ollama pull gpt-oss:20b

4. $ codex --oss -m gpt-oss:20b

This runs locally without Internet. Idk if there’s telemetry for codex, but you should be able to turn that off if so.

You need an M1 Mac or better with at least 24GB of GPU memory. The model is pretty big, about 16GB of disk space in ~/.ollama

Be careful - the 120b model is 1.5× better than this 20b variant, but takes 5× higher requirements.

By windexh8er 2025-11-013:42

I've been really impressed by OpenCode [0]. The limitations of all the frontier TUI is removed and it is feature complete and performant compared to Codex or Claude Code.

[0] https://opencode.ai/

By abacadaba 2025-11-012:41

As much as I've been using llms via api all day every day, being able to run it locally on my mba and talk to my laptop still feels like magic

By giancarlostoro 2025-11-010:57

LM Studio is even easier, and things like JetBrains IDEs will sync to LM Studio, same with Zed.

By nickthegreek 2025-10-3120:41

have you been able to build or reiterate anything of value using just 20b to vibe code?

Hacker News