I think this was discussed in Jon Bentley "programming pearls"?
Also in the same book it was mentioned that the disjoint cycles method (also mentioned in the article) was worse for paging/caching than the three reverses method.
That's probably true for small primitive types, but if your objects are expensive to move (like a large struct) it might be beneficial to minimize swaps.
Yeah, it might be interesting to run some profiling of both algorithms and see how they perform dependent on the size of the blocks being swapped (which doesn't even have to be equal to the size of the object in the array).
It is discussed in that book. Very fun read, all told. Highly recommended if folks find this sort of thing fun. I think I should thumb through it again. :D
Also in the same book it was mentioned that the disjoint cycles method (also mentioned in the article) was worse for paging/caching than the three reverses method.