Thanks for benchmarking this! In my environment (Zulu14.28+21-CA, MacOS 11.0.1) the performance of the native byte-order buffer (buffer_crazy) is quite close to the int[] array. An off-heap version of the same (via allocateDirect) show as buffer_direct below fares much worse though:
The differences seem to stem from writes, as modifying the benchmark to do reads only (summing array values to a long, consumed by Blackhole) yields the same performance for all IntBuffer variants:
Thanks for linking to the tweet, I should have done so in the updated code. I did link to the tweet in a github issue.
Markus Schabersays:
Is anyone here aware of a similar benchmark for C#, comparing the “traditional” methods of Arrays, unsafe pointers, and the Read/Write methods of the System.Runtime.InteropServices.Marshal class with the new Memory and Span types?
If all goes well, Java 16 will correctly vectorize Buffer (and MemorySegment) accesses too, if appliable. The only missing piece are direct Buffers for now, but at least that’s something they’re aware of now, too.
Just to highlight, the results here are rather specific to one particular VM implementation. I decided to check how this performed on Azul Zing (which uses a very different top tier compiler), and as I’d somewhat expected, saw a very different performance picture.
JDK 11.0.9.1, OpenJDK 64-Bit Server VM, 11.0.9.1+1-Ubuntu-0ubuntu1.20.04
These results were collected on an AWS EC2 c5.4xlarge image (which is a skylake server part). This was run on the “feature preview” (i.e. weirdly named beta) for Zing downloadable from here: https://docs.azul.com/zing/zing-quick-start-tar-fp.htm
Disclaimer: I used to work for Azul, and in particularly, on the Falcon JIT compiler Zing uses. I’m definitely biased here.
Hi Daniel,
Thanks for benchmarking this! In my environment (Zulu14.28+21-CA, MacOS 11.0.1) the performance of the native byte-order buffer (buffer_crazy) is quite close to the int[] array. An off-heap version of the same (via allocateDirect) show as buffer_direct below fares much worse though:
Benchmark Mode Cnt Score Error Units
MyBenchmark.array avgt 15 3.801 ± 0.143 us/op
MyBenchmark.buffer avgt 15 18.480 ± 1.664 us/op
MyBenchmark.buffer_crazy avgt 15 4.087 ± 0.256 us/op
MyBenchmark.buffer_direct avgt 15 25.111 ± 0.536 us/op
The differences seem to stem from writes, as modifying the benchmark to do reads only (summing array values to a long, consumed by Blackhole) yields the same performance for all IntBuffer variants:
Benchmark Mode Cnt Score Error Units
SumBenchmark.array avgt 15 16.227 ± 0.536 us/op
SumBenchmark.buffer avgt 15 16.879 ± 1.179 us/op
SumBenchmark.buffer_crazy avgt 15 16.659 ± 0.643 us/op
SumBenchmark.buffer_direct avgt 15 17.774 ± 1.067 us/op
Regards,
Viktor
Indeed, the “buffer_crazy” workaround was born from a discussion on Twitter, and is expected to be in par with array: https://twitter.com/lemire/status/1333466140178313216
Mr. Lemire simply did not update the post.
There is now a bug filed on the IntArray version, too: https://bugs.openjdk.java.net/browse/JDK-8257531
That is correct, the post has not been updated.
Thanks for linking to the tweet, I should have done so in the updated code. I did link to the tweet in a github issue.
Is anyone here aware of a similar benchmark for C#, comparing the “traditional” methods of Arrays,
unsafe
pointers, and the Read/Write methods of theSystem.Runtime.InteropServices.Marshal
class with the new Memory and Span types?Note that the issue has been picked up: https://bugs.openjdk.java.net/browse/JDK-8257531 and is being fixed: https://github.com/openjdk/jdk/pull/1618/files
If all goes well, Java 16 will correctly vectorize
Buffer
(andMemorySegment
) accesses too, if appliable. The only missing piece are direct Buffers for now, but at least that’s something they’re aware of now, too.Just to highlight, the results here are rather specific to one particular VM implementation. I decided to check how this performed on Azul Zing (which uses a very different top tier compiler), and as I’d somewhat expected, saw a very different performance picture.
JDK 11.0.9.1, OpenJDK 64-Bit Server VM, 11.0.9.1+1-Ubuntu-0ubuntu1.20.04
Benchmark Mode Cnt Score Error Units
MyBenchmark.array avgt 15 3.151 ± 0.005 us/op
MyBenchmark.buffer avgt 15 18.463 ± 0.006 us/op
MyBenchmark.buffer_crazy avgt 15 3.161 ± 0.003 us/op
MyBenchmark.buffer_direct avgt 15 89.536 ± 0.049 us/op
JDK 11.0.9.1-internal, Zing 64-Bit Tiered VM, 11.0.9.1-zing_99.99.99.99.dev-b3323-product-linux-X86_64
Benchmark Mode Cnt Score Error Units
MyBenchmark.array avgt 15 3.683 ± 0.475 us/op
MyBenchmark.buffer avgt 15 3.871 ± 0.395 us/op
MyBenchmark.buffer_crazy avgt 15 3.856 ± 0.386 us/op
MyBenchmark.buffer_direct avgt 15 16.594 ± 0.002 us/op
These results were collected on an AWS EC2 c5.4xlarge image (which is a skylake server part). This was run on the “feature preview” (i.e. weirdly named beta) for Zing downloadable from here: https://docs.azul.com/zing/zing-quick-start-tar-fp.htm
Disclaimer: I used to work for Azul, and in particularly, on the Falcon JIT compiler Zing uses. I’m definitely biased here.