And, regardless, speculative execution will happily figure out that, hey, the next loop iteration starts with
> add rsi, 4; mov rbx, [rsi].d
and rsi is 0xDEADBFFC, so rsi+4 is 0xDEADC000, so let's speculatively load that address....
Extra bonus - the compiler may be able to auto-vectorize that.
A linked list, on the other hand... hardware can't tell what to prefetch in
> mov rsi, [rsi].q; mov rax, [rsi+8].d
because it has no idea what rsi is going to be after the first instruction.
So it's not just cache locality (which is huge!), it's also the ability of the CPU to actually run things in parallel, instead of serializing everything (and, again, if you're really lucky, the compiler will auto-vectorize the array code)
I think Sandybridge or Skylake, made this exact optimization that in turn speeds up execution of vtables and interpreters by reading through a pointer so it might indeed be faster on some architectures.
Your point stands though, that linear array access will have similar performance everywhere. Where as LL will have a wider envelope due to speculative loads.
That works for vtables and interpreters because the vtable/interpreter jump table will be hot and in cache. It doesn't help for linked list iteration unless the next element in the list is in L1 cache.
Indeed, speculative loads from pages that you don't even own has been a big big topic on this site for over a year. There's no respect for page boundaries in the hardware.
> add rsi, 4; mov rbx, [rsi].d
and rsi is 0xDEADBFFC, so rsi+4 is 0xDEADC000, so let's speculatively load that address....
Extra bonus - the compiler may be able to auto-vectorize that.
A linked list, on the other hand... hardware can't tell what to prefetch in
> mov rsi, [rsi].q; mov rax, [rsi+8].d
because it has no idea what rsi is going to be after the first instruction.
So it's not just cache locality (which is huge!), it's also the ability of the CPU to actually run things in parallel, instead of serializing everything (and, again, if you're really lucky, the compiler will auto-vectorize the array code)