I guess their point was to demonstrate that it's possible to bake a decently-sized model to a silicon? As with anything related to HW, I guess the lead time will be considerably larger than the software counterparts, so I guess in 1-2 years timeframe we might see something like Gemma 4 baked onto a silicon.
Yeah, I think the important part is the process to convert the model to silicon, not the actual implementation itself.
Whether it succeeds now depends a lot on the rate of improvement of model architecture. They're betting on model design and capability improvements slowing down - and then wiping the floor with everyone else with their inference economics.
I think this is the future. When models start converging at "really good" (which I think is already happening) then burning them into ASIC silicon is the natural next step.
Harnesses can keep improving with a fixed model and the throughput opens up new possibilities like doing 10x more "thinking" or exploring parallel paths and picking the best.