Daniel Lemire's blog

, 16 min read

Optimizing polymorphic code in Java

23 thoughts on “Optimizing polymorphic code in Java”

  1. Franklin Chen says:

    I wonder if Java will get annotated miniboxing or specialization as currently available for Scala for efficiency of generic code: http://scala-miniboxing.org/

  2. Chris Nahr says:

    Did you check the generated machine code? I suspect these methods are in fact inlined, but another stage of optimization is missing: special-casing arrays to eliminate bounds checking on every element access. That’s what you get when you write those manually inlined loops over back.length.

  3. Dmitry Platonoff says:

    I realize this point is not directly relevant to your topic of optimizing polymorphism, but optimization is always a sum of several efforts. For example, changing the loops in your BasicSummer to

    for(int k = array.size() – 1; k >= 0; k–)

    cuts execution time by almost 40%. People seriously underestimate reverse loops and the savings they offer.

  4. @Chris

    That is a distinct possibility.

    Of course, there can be different ways to “inline” code in Java… but I had this in mind when I qualified “inline” with “fully”.

    I am not exactly sure how to access the machine code in java since it is JIT compiling… and javac will not, as far as I can tell, inline functions.

  5. @Franklin

    Thanks for pointing out miniboxing to me.

  6. BB says:

    In Java people can play tricks with the class loader and reflexion, the runtime cannot make sure that other implementations of Array don’t exist.
    As one more Array impl may be loaded later on without its class name being part of the code, the VM is unable to optimize the code.

    In this situation you could use a bytecode optimization library like Soot. Here as your object has a single effectively final member, it can be completely optimized away into static calls with that one object as one extra parameter.

  7. @BB

    Given the example of this blog post, can you share with us how you would optimize it using Soot?

  8. @Dmitry Platonoff great observation, we get 1.5x reduction for BasicSummer and 1.2x reduction for FastSummer (Java8, core i7).

    Do you have any idea why there is a much larger difference for BasicSummer?

    @Lemire This is what I hate in Java and some other languages. It is all great until you have to create atrocities like parallel arrays.

  9. @Leonid

    It is indeed quite surprising that simply looping in reverse, you can go 40% faster with BasicSummer. This makes little sense as very little of the computational burden has to do with the loop itself in this case.

    In C, reverse loops are not typically faster, they may even be slower at times. So I am not even sure why reverse loops are faster in Java.

  10. Chris Nahr says:

    Reverse loops only need to evaluate the method call once, for the starting index. The ending index must be evaluated for every cycle unless the optimizer can prove that the returned value won’t change. I’m guessing this is the cause.

    That’s another built-in optimization for arrays, by the way, so there should be no difference when looping over array.length.

  11. Dmitry Platonoff says:

    The gain with reverse loops is pretty trivial, it helps you eliminate an extra method call per iteration. As you demonstrated in the original post, function calls add a lot of overhead, and it really adds up. It’s common not to think twice about adding that size() or length() call in the middle of the “for” statement. In our example, removing it halves the number of method invocations per loop.

  12. @Chris

    Good point, but we do see a 20% gain with reverse loops on straight arrays.

    That is, this is faster:

    https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2014/12/17/ReverseFastSummer.java#L8-L15

    than this…

    https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2014/12/17/FastSummer.java#L8-L15

    I realize that at the assembly level, a reverse loop can save an instruction sometimes, by checking the overflow bit (or something of the sort), but recent Intel processors can go just as fast when iterating forward…

    I find this very mysterious.

  13. @Dmitry

    I understand how saving function calls can help, but it does not explain why reverse loops on straight arrays are significantly faster.

  14. @Daniel, perhaps, comparing against zero is faster or it might be related to branch misprediction.

  15. Dmitry Platonoff says:

    @Daniel Sorry, I was replying to Leonid. Should’ve refreshed the page before posting as Chris chimed in already…

    Oddly enough, I had also not observed any gains with reverse loops on straight arrays. They were in fact a little bit slower on my PC (Java 1.8.0_25-b18, Win8.1, AMD FX 8-core CPU).

  16. @Dmitry

    Can you post your results using the latest version of the code from GitHub?

    Here is what I get…

    $ java Benchmark
    refast fast basic smart resmart silly fixed rebasic
    0.8537767 0.933717 5.8813856 0.9737157 0.9292436 5.8855361 3.7857766 3.3170692 3.7141652
    
  17. @Dmitry,
    right but this has nothing to do with reverse loops per se. I am getting the same gain by memorizing the size in the variable and using this variable in the loop.

  18. PS: array.length call is inlined or something like this. There’s no point in memorizing it.

    Regarding doing things backwardly: I really wonder how this is supported by various CPU and memory “predictors”. For example, if you read long arrays, will prefetch understand that what you are going to read next?

  19. J. Andrew Rogers says:

    Daniel, I immediately see one reason why the reverse could be faster than forward. In the forward case, the call to array.size() has to be evaluated in each loop iteration, which might not be optimized away in Java in the way it would be for C.

    In C, both cases would compile to something like initializing a register with the number of iterations and loop the operation until the register hit zero. For the reverse case in Java, where the iteration termination test is an immediate literal, I would expect the Java to produce virtually identical code to the C. For the forward case, there is room for Java to do something less optimal, especially if the optimizer is shy about inlining as you demonstrate. It doesn’t make much sense but Java has a lot of odd edges in practice.

  20. Dmitry Platonoff says:

    @Daniel there’s no rebasic on github. Could you push the latest changes?

  21. Dmitry Platonoff says:

    This is what I get

    refast fast basic smart resmart silly fixed
    1.6447691 1.6076136 7.4859653 1.6031647 1.6433024 7.4883121 4.7948762 4.9081517

  22. Dmitry Platonoff says:

    The last value is refixed (it was missing a label). The odd part is that it’s not significantly slower or faster than fixed. It floats up and down a bit, but there’s no noticeable gain or loss. Here’s a few more runs:
    http://pastebin.com/raw.php?i=f1SGgDwx

  23. @Dmitry

    I have pushed the code, sorry.

    I was able to reproduce the negative results on an AMD processor. It seems that backward loops on standard arrays are faster on Haswell processors, but not necessarily on other processors.