Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to turn them way down, and as a 31b dense model is fairly slow on a Strix Halo. I did have a lot of tool calling issues on 26b-a4b, though.
My setup is a bit of a mess as I experiment with different ways of configuring and hosting local models. So at some point I was experimenting with the router server but stopped doing that, but some of my settings are still in models.ini while some are on the command line.
With the following as the relevant settings in models.ini (I actually have no idea if these settings are applied when not using the router server, it's been hard for me to figure out what settings are actually applied when using bot the command line and models.ini
[*]
jinja = true
seed = 3407
flash-attn = on
[unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL]
temperature = 1.0
top_p = 0.95
top_k = 64
As my harness, I'm using pi, with a pretty vanilla config.
Anyhow, Gemms 4 31b worked in this config, but it was slow and RAM hungry. Since then, I've mostly moved to Qwen 3.6 35b-a3b because it's a lot faster.
I'm not actually doing anything useful with these yet, but I've used them for some experiments and Qwen 3.6 35b-a3b was capable of doing some pretty long mostly unsupervised agentic loops in my experimentation.
The Qwen models are quite solid though.