Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder what latency and tok/s this model on Groq or Cerebras would be capable of. I have a couple LLM driven games [1][2] where speed is really important to the experience. Currently the best performance I can get is the gpt-oss models on Groq or Cerebras but they need quite a bit of extra context and tools to correct for mistakes. I'm making a bet I'll be able to get the same performance much cheaper in the next few months.

[1] https://sleuththetruth.com [2] https://lextension.net/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: