Chrome seems to use a custom inference runtime also (in addition to Gemini Nano)...

Chrome seems to use a custom inference runtime also (in addition to Gemini Nano). It would be better if this were all interoperable. The WebGPU alternatives like WebLLM do not have the same access.

I've been trying these models out for the last year, and it seems to me that we want them to work in a 5-10W "laptop" power envelope, but they really work best with a 50-500W GPU instead - i.e. they eat batteries. This means things work better in a "plugged in" gaming laptop/desktop rather than a typical web client. At least for now.