Check chatjimmy.ai

lelandbatey · 2026-05-05T19:50:43 1778010643

https://chatjimmy.ai being a demo of the "burn the model to an ASIC" approach being sold by Taalas[0], an approach which they use to run Llama 3.1 8B at ~17000 tokens per second.

[0] - https://taalas.com/products/

snek_case · 2026-05-06T03:19:36 1778037576

Not to downplay their accomplishment but Llama 3.1 8B is a terrible model. It's really outdated at this point. It's cool that they were able to accelerate a model with silicon, but it also feels wasteful since llama 8B is such a useless model?

puilp0502 · 2026-05-06T06:38:10 1778049490

I guess their point was to demonstrate that it's possible to bake a decently-sized model to a silicon? As with anything related to HW, I guess the lead time will be considerably larger than the software counterparts, so I guess in 1-2 years timeframe we might see something like Gemma 4 baked onto a silicon.

leoedin · 2026-05-06T08:41:43 1778056903

Yeah, I think the important part is the process to convert the model to silicon, not the actual implementation itself.

Whether it succeeds now depends a lot on the rate of improvement of model architecture. They're betting on model design and capability improvements slowing down - and then wiping the floor with everyone else with their inference economics.

WASDx · 2026-05-06T18:20:39 1778091639

I think this is the future. When models start converging at "really good" (which I think is already happening) then burning them into ASIC silicon is the natural next step.

Harnesses can keep improving with a fixed model and the throughput opens up new possibilities like doing 10x more "thinking" or exploring parallel paths and picking the best.

imtringued · 2026-05-06T08:37:55 1778056675

I agree, Gemma 3 12B is a very good model for its size and it was only obsoleted by Gemma 4.

Heck, I'm still a fan of Gemma 2 9B.

satellite2 · 2026-05-06T22:14:21 1778105661

is it still a useless model if, say, you can run it at (prompt+output)*24/s and use it to make executive function decisions?