Ah, got it. Yea, then I'd focus on learning how RoPE works first. That will at least help you understand how the retrieval in current long context implementations is so limited.
A colleague from a discord I spend time in threw together this video a year or so ago, might be helpful as a first watch before a deep dive: https://www.youtube.com/watch?v=IZYx2YFzVNc
Covers positional encoding as a general concept first, then goes into rotary embeddings.