Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks! I am familiar with attention, linear attention, flash attention etc... just not up to speed on how it is scaled to 1M or 10M context windows.


Ah, got it. Yea, then I'd focus on learning how RoPE works first. That will at least help you understand how the retrieval in current long context implementations is so limited.

A colleague from a discord I spend time in threw together this video a year or so ago, might be helpful as a first watch before a deep dive: https://www.youtube.com/watch?v=IZYx2YFzVNc

Covers positional encoding as a general concept first, then goes into rotary embeddings.


Thanks! That was super helpful.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: