I'm stumbling onto a strange Java performance irregularity, especially easy to hit with Docker on Linux. A super straightforward implementation of Eratosthenes sieve using a Java BitSet for storage, sometimes drops in performance by 50%. Even more with JDK8. Any JAVA/JDK/JVM experts here that can shed some light on the mystery for me? I've been at it a while, but it is quite a bit outside my field of knowledge, and I am out of ideas for how to proceed. The blog article linked + the Git repository is an attempt to summarize/minimize things.
Do you have performance traces of good and bad runs to compare? Instead of trying to come up with more theories you should start by looking at where the time is being spent.
I'm not experienced with doing micro-optimisations in Java, but I assume you can profile it to find out which individual operations are taking up the time just via a sampling trace.
Given it's common on a particular OS setup, I wouldn't be surprised if it's in the System.currentTimeMillis() call that you're doing every loop.
you could try running it with the linux perf command.
`perf stat -d java ...`
if your CPU supports performance counters it should give you L1 cache misses/branch misses/etc. that might give you some insight into what is different between the runs. someone else mentioned it could be the memory alignment. i think java might allocate with 8 byte alignment and maybe something funny goes on with the L1-caching if the bitset allocation is not 16 byte or 32 byte aligned.
if you are running it within docker you might need to use: ` --security-opt seccomp:unconfined --cap-add SYS_PTRACE --cap-add CAP_SYS_ADMIN`
This would be my first go-to as well. See if you are retiring less instructions, and if yes, see if the counters tell what the CPU is stalling on vs the faster runs.
Are you testing on a x64 laptop? It could possibly be CPU throttling? Maybe try run the benchmark written in a non-JVM language to see if you get the same beahviour?
It's a mini-pc. Could be throttling. There are benchmarks written in tons of languages here: https://github.com/PlummersSoftwareLLC/Primes I've so far only seen this behaviour with the JVM involved.
Would also be interesting to see if this is a Java issue, or a JVM implementation issue: ie, does an OpenJ9 based JVM (Available under the weird IBM Semeru name here [1]) have similar behaviour?
Yes, 1X, 0.5X are a bit arbitrary. I mean 1X as the speed I know it can run at. So it could well be that with JDK7 the optimizations could not happen and that the question then is why the optimizations do not always happen.
Making each run longer, 1 minute instead of 5 seconds displays the same performance toggling:
Not certain. But there are a lot of implementations very similar to mine in tons of other languages and vms. None display this afaik. Also I get different behavior with JVM/JDK 7.