Notes on debugging HotSpot's JIT compilation (2023)

acimim_ha · on March 26, 2024

I don't really see why:

  public class TestJIT {
    public static void main(String[] args) {
        for (int i = 0; i < 20_000; i++) {
            payload();
        }
    }

    public static int payload(int a, int b) {
        return a + b;
    }
  }

shouldn't be optimized into a 'no-op'. The end-effect is the same.

mike_hearn · on March 26, 2024

Things like that often will be, which is why you generally need to use the JMH harness to do microbenchmarking on the JVM. It uses internal APIs to stop the compiler treating results as dead and eliminating them.

In this case it doesn't happen because to see that the entire operation is dead requires the compiler to inline payload into main, but he says he disabled inlining for that method specifically so it wouldn't happen. Recall that the goal is to see the assembly for a block of code in isolation, not demo what the JVM can do when given free reign.

comonoid · on March 26, 2024

Well, inlining is not necessary, good old interprocedure analysis would resolve it too.

lewurm · on March 26, 2024

Sure, but neither javac (the Java to bytecode compiler) or HotSpot are doing that. The former tries to preserve as much as possible, and for the latter interprocedure analysis is too costly at run-time.

anonymous-panda · on March 26, 2024

Could javac do the analysis and record it in the bytecode for HotSpot to optimize? Or is this kind of hybrid teamwork not done?

mike_hearn · on March 26, 2024

It is done, but for this case the problem is partial compilation. For this you'd need methods to be tagged as pure, but that assumption needs to propagate and it could be violated by a library being upgraded.

neonsunset · on March 26, 2024

There are two aspects to this:

- If payload is not inlined, the loop can't be optimized away. The fact of iteration itself may be a desirable side-effect (spin-wait/pause) a stricter compiler can't make an assumption about, unlike GCC or Clang

- If payload is inlined, it should be a no-op. If it's not, and its result consumed by an opaque "sink" method, there may be limitations.

On interproc analysis - don't forget you can dynamically load code and access payload through reflection too. This limits certain optimizations that are otherwise legal in AOT compilation. .NET has similar restrictions and corresponding differences when publishing binaries with JIT vs AOT - the former gets to enjoy DynamicPGO (HotSpot kind of optimizations), the latter gets to enjoy frozen world (with exact devirtualization, faster reflection, auto-sealing, etc. but overall not as good as DynamicPGO with guarded devirt, branch reordering, etc.).

munificent · on March 26, 2024

Early in the article, the author tells the compiler not to inline calls to payload(). When that isn't inlined, the compiler can't tell if the body of the loop has side effects or not, so it won't be able to eliminate the loop.

hardware2win · on March 26, 2024

Why would you want to remove those calculations?

You would change behavior of this program

acimim_ha · on March 26, 2024

How? Except it would finish sooner. The is no behavior.

hardware2win · on March 26, 2024

There is a side effect like cpu temperature increase

If I put my leg on my desktop tower then I may feel enjoyably warm or if I put some chocolate on my laptop then it may start melting

Also fans will be louder

thfuran · on March 26, 2024

None of that is behavior according to the java (or any sane) language specification, so it can be optimized out of existence.

hardware2win · on March 26, 2024

It is written in other, more general "specification" called physics

Computers arent purely abstract, they exist in real world and are affected by it, so lets do not try to pretend otherwise

funcDropShadow · on March 26, 2024

So according to that "specification" no optimization is allowed. Since that would almost always change the "heating behavior" of code. Therefore, it is absurd.

thfuran · on March 26, 2024

None at all, even out of order execution. For that matter, executing the same code on different hardware is right out. Every program must be implemented on single-purpose hardware, and you can't even manufacture two of them.

lpapez · on March 26, 2024

Yet the compiler writers care only about the language spec, and you can bet that failing to optimize this as dead code would be considered a compiler bug.

This goes not only for Java compiler, but many other languages as well.

hardware2win · on March 26, 2024

Sometimes compiler engineers try to be too smart and they end up creating a mess :)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8537

>When optimizing code for "dead store removal" the optimizing compiler may remove code necessary for security.

guappa · on March 26, 2024

I'm going to point out that the bug is marked " RESOLVED INVALID "

hardware2win · on March 26, 2024

Political thing, but bug created by removal is real :)

guappa · on March 26, 2024

Well you posted the link… next time at least try to post one that supports your idea instead of the opposite.

planede · on March 26, 2024

The bug is real, but it isn't the compiler's.

thfuran · on March 26, 2024

What exactly do you expect an optimizing compiler to do?

hardware2win · on March 26, 2024

Leave code without obvious side effects alone (this is different from dead code)

dzaima · on March 27, 2024

What would "leave alone" even be? There's no "default" state of performance of Java code; it would be ridiculously stupid for there to be something saying that, say, "a+b" for int type values has to take at least 1 nanosecond or something. And you can't use big O complexity here either - the int type has a maximum of 2 billion, and thus a loop over it is trivially O(1), just with a potentially-big constant factor. (or, alternatively, the loop was sped up by a constant factor of 2 billion, and optimizing compilers should extremely obviously be allowed to optimize code by a constant factor)

thfuran · on March 26, 2024

But this is code obviously without side effect.

hardware2win · on March 26, 2024

There are side effects, but in real world, not abstract

dkersten · on March 26, 2024

So non-obvious side effects?

You can say that about any code at all, so no optimising would ever be possible. The program running faster is a side effect after all.

thfuran · on March 27, 2024

No, that code has no side effects. The implementation is free to produce whatever side effects it wants or needs as part of execution, but that is absolutely none of the compiler's business.

The_Colonel · on March 26, 2024

Not sure if being facetious, but FWIW you can't really rely on these. Next year you'll get a much faster CPU and memory and the timing will be all different. Or, tomorrow you run it while encoding video on the CPU which eats 99% of CPU, and it's hundred times slower.

funcDropShadow · on March 26, 2024

Obviously, there is an xkcd for that: https://xkcd.com/1172/

saagarjha · on March 26, 2024

Excellent. Next time I’ll have something to refer to when making memes like this one: https://twitter.com/_saagarjha/status/1576961522936340480. Maybe I can avoid building OpenJDK too!

dzaima · on March 27, 2024

Some self-advertising of a linux tool I made which can display perf record data with Java JIT disassembly (doesn't need hsdis): https://github.com/dzaima/grr?tab=readme-ov-file#java-jit