> But real textbook IPC is useless for comparison. It's useful for comparing arc...

Symmetry · on Dec 6, 2019

It's really not useful for gauging potential. There are tradeoffs in how deeply you pipeline your architectures that'll tend to result in higher clock rates for shorter pipeline stages but higher IPC for longer pipeline stages, for instance. It's pretty easy to make a design with an IPC that'll blow everything else out of the water if it only needs to hit 100 MHz. For instance the slower a clock cycle is the larger you can make your caches and the less clock cycles it takes to read from them.

Also, on real world benchmarks that don't fit neatly in cache, for a given chip IPC will tend to increase as you underclock it because that will cause memory latency to go down.

BeeOnRope · on Dec 6, 2019

Note that the chip with the higher specific frequency in this test, and the higher max frequency across the product line (Skylake+), gets a higher IPC here, so this kind of tradeoff isn't the obvious cause of the results here.

sitkack · on Dec 6, 2019

IPC is _usually_ a good measure for the last phase of optimization. But it is only the local Δ that is meaningful, comparing IPC across different vendors is only useful as a gross measure.

AstralStorm · on Dec 6, 2019

It's not even useful as a gross measure, unfortunately. Too many moving parts in the way.

Say, if you used IPC only then you'd probably pick the latest Apple ARM CPU. Except it cannot go as high clock in any of the subunits as top AMD and Intel, cache is slower, and memory bandwidth abysmal in comparison.

Performance in seconds or performance per watt (unit is 1/(W*s)) in the workload you want to run is useful.

You cannot even estimate anything using microbenchmarks anymore easily since they expanded per unit local clocking in x86... (AMD in Zen+ and expanded in Zen 2, most ARM mobile CPUS, Intel since Broadwell E, expanded in Skylake.)

You get traps such as going for AVX and locally overheating the CPU where SSE2 equivalent would go faster in real life. It's all funny business.

IPC also heavily favors RISC instead of SIMD, likewise is biased against multicore. (Though not as much.) What counts as an instruction anyway?

AstralStorm · on Dec 6, 2019

There's no "gauging potential". Would you suddenly go with OICC if it has extremely high IPC? How about old Core instead of new Skylake? Oh shoot, there is no potential in Core if it's not being made!

Even different Zen 2 CPUs have varied performance properties not just due to cores, but due to CCX count.

The exactly one use for such microbenchmark and that's optimizing the compilers.

Even if there were multiple implementations?

Also remember that x86-64 unlike x86 is not closed, and unlike POWER, RISC-V, ARM or MIPS is not actually well defined.

If AMD suddenly adds a new but useful instruction set like they did with 3DNOW in ancient times, or accelerate something reasonably common that way, say add a special SIMD conditional, where do you even start in comparison? What if Intel actually does add a useful FPGA programmable computing capability as promised or enhanced DMA?