You're welcome! Yes, we have KV cache. Being able to implement this efficiently ...

ppsreejith · on Feb 19, 2024

Thanks again! Hope I'm not overwhelming but one more question: Are you decoding with batch size = 1 or is it more?

tome · on Feb 19, 2024

That's OK, feel free to keep asking!

I think currently 1. Unlike with graphics processors, which really need data parallelism to get good throughput, our LPU architecture allows us to deliver good throughput even at batch size 1.