Really excited to try this once it is merged into llama.cpp.
Gemma 4 26B-A4B is much quicker on my setup vs Qwen3.6-35B-A3B (by about 3x), so the thought of a 1.5 speedup is tantalizing.
Have tried draft models to limited success (the smaller 3B draft model in addition to a dense 14B Ministral model introduced too much overhead already)
Gemma 4 26B-A4B is much quicker on my setup vs Qwen3.6-35B-A3B (by about 3x), so the thought of a 1.5 speedup is tantalizing.
Have tried draft models to limited success (the smaller 3B draft model in addition to a dense 14B Ministral model introduced too much overhead already)