Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're welcome! Yes, we have KV cache. Being able to implement this efficiently in terms of hardware requirements and compute time is one of the benefits of our deterministic chip architecture (and deterministic system architecture).


Thanks again! Hope I'm not overwhelming but one more question: Are you decoding with batch size = 1 or is it more?


That's OK, feel free to keep asking!

I think currently 1. Unlike with graphics processors, which really need data parallelism to get good throughput, our LPU architecture allows us to deliver good throughput even at batch size 1.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: