public class TestJIT {
public static void main(String[] args) {
for (int i = 0; i < 20_000; i++) {
payload();
}
}
public static int payload(int a, int b) {
return a + b;
}
}
shouldn't be optimized into a 'no-op'. The end-effect is the same.
Things like that often will be, which is why you generally need to use the JMH harness to do microbenchmarking on the JVM. It uses internal APIs to stop the compiler treating results as dead and eliminating them.
In this case it doesn't happen because to see that the entire operation is dead requires the compiler to inline payload into main, but he says he disabled inlining for that method specifically so it wouldn't happen. Recall that the goal is to see the assembly for a block of code in isolation, not demo what the JVM can do when given free reign.
Sure, but neither javac (the Java to bytecode compiler) or HotSpot are doing that. The former tries to preserve as much as possible, and for the latter interprocedure analysis is too costly at run-time.
It is done, but for this case the problem is partial compilation. For this you'd need methods to be tagged as pure, but that assumption needs to propagate and it could be violated by a library being upgraded.
- If payload is not inlined, the loop can't be optimized away. The fact of iteration itself may be a desirable side-effect (spin-wait/pause) a stricter compiler can't make an assumption about, unlike GCC or Clang
- If payload is inlined, it should be a no-op. If it's not, and its result consumed by an opaque "sink" method, there may be limitations.
On interproc analysis - don't forget you can dynamically load code and access payload through reflection too. This limits certain optimizations that are otherwise legal in AOT compilation. .NET has similar restrictions and corresponding differences when publishing binaries with JIT vs AOT - the former gets to enjoy DynamicPGO (HotSpot kind of optimizations), the latter gets to enjoy frozen world (with exact devirtualization, faster reflection, auto-sealing, etc. but overall not as good as DynamicPGO with guarded devirt, branch reordering, etc.).
Early in the article, the author tells the compiler not to inline calls to payload(). When that isn't inlined, the compiler can't tell if the body of the loop has side effects or not, so it won't be able to eliminate the loop.
So according to that "specification" no optimization is allowed. Since that would almost always change the "heating behavior" of code. Therefore, it is absurd.
None at all, even out of order execution. For that matter, executing the same code on different hardware is right out. Every program must be implemented on single-purpose hardware, and you can't even manufacture two of them.
Yet the compiler writers care only about the language spec, and you can bet that failing to optimize this as dead code would be considered a compiler bug.
This goes not only for Java compiler, but many other languages as well.
What would "leave alone" even be? There's no "default" state of performance of Java code; it would be ridiculously stupid for there to be something saying that, say, "a+b" for int type values has to take at least 1 nanosecond or something. And you can't use big O complexity here either - the int type has a maximum of 2 billion, and thus a loop over it is trivially O(1), just with a potentially-big constant factor. (or, alternatively, the loop was sped up by a constant factor of 2 billion, and optimizing compilers should extremely obviously be allowed to optimize code by a constant factor)
No, that code has no side effects. The implementation is free to produce whatever side effects it wants or needs as part of execution, but that is absolutely none of the compiler's business.
Not sure if being facetious, but FWIW you can't really rely on these. Next year you'll get a much faster CPU and memory and the timing will be all different. Or, tomorrow you run it while encoding video on the CPU which eats 99% of CPU, and it's hundred times slower.