Essentially, instead of a direct addressing like V[i], you can do VV[log2(i)][i]. Where log2() is implemented with a single BSR instruction.
The addressing table for the first level is very small in size, and it can be assumed always in cache.
Here the .h/.c source: https://github.com/amadvance/tommyds/blob/master/tommyds/tom...
https://github.com/amadvance/tommyds/blob/master/tommyds/tom...
Did you used something different ?
https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf
Essentially, instead of a direct addressing like V[i], you can do VV[log2(i)][i]. Where log2() is implemented with a single BSR instruction.
The addressing table for the first level is very small in size, and it can be assumed always in cache.
Here the .h/.c source: https://github.com/amadvance/tommyds/blob/master/tommyds/tom...
https://github.com/amadvance/tommyds/blob/master/tommyds/tom...
Did you used something different ?