Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Depends what you need the model to do. The recent granite4.1:3b just takes 2GB of memory and is fast. Results are pretty good and support tool calling. Barely a squeak out of the Mac laptop.

Even faster with the MLX builds.

Then when I need more heavy lifting I fire up a larger model.

IMHO the issue isn't the models. I've had OpenClaw give the same results as Claude using open models locally. Slower but does the job. Something that can do optimal model switching is what's needed.

 help



Yeah it 100% depends what you want the model to do. Some tasks, like extraction, summarization, or simple tool calling (e.g. "turn on my desk lamp") are very doable with tiny models. Others, like coding or more advanced agentic workflows can demand much more powerful models. I was thinking from the lens of coding or running _big_ data extraction pipelines (think ~8 billion pages).

> thers, like coding or more advanced agentic workflows can demand much more powerful models.

You can do coding and agentic fine. For coding I use qwen3.6:35b-mlx and agentic granite4.1:3b works fine.

These are the models I use.

- granite4.1:3b

- granite4.1:30b

- gpt-oss:20b

- gpt-oss:120b (less so now)

- mistral-small3.2

- qwen3.6:35b-mlx

There will always be use cases that don't sit on your laptop, but most of what can be done can be done locally, it just requires a good framework to sit on it.


Why do you like gpt-oss-120b less now? What replaced it?

It's very likely to hallucinate. I'm mostly using Gemma 4 31B now when I need something offline. It is a very strong model for its size.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: