One thing I just noticed is that add/poll can be replaced with swapping the root and percolating down, so that two separate heap adjustments aren’t needed.
Unfortunately I’m not seeing that increase on the most recent version of the test, but it still seems like the right way to do it.
And, this heap operation is known as a “replace top” in Postgres; they quote Knuth and Heapsort as the source. It’s required for heapify, but I agree with Yonik that it should be exposed as a method.
Looking at their code now I see that they also do the peek() check. It’s fairly standard.
The Quickselect method seems like it should be faster, since it uses the previous k-th value as the pivot on each new run (I checked), and that should yield the same pruning as the peek() check during the first pass through the data buffer. The overhead is probably in the array storage before the Quickselect call, and the fact that half the array is already pivoted.
Years before I had performed a benchmark between b-heap and binary heap according to this repository https://github.com/valyala/gheap, however it had not been proven to be faster than standard std::priority_queue, perhaps it’s because of the implementation details which I did not dive into.
I ran some rather comprehensive tests. Turns out for moderate-size queues, it’s quite hard to beat the binary heap. Perhaps, some years ago caching was a problem, but now it seems to be less of a problem. My tests are available online: https://github.com/searchivarius/BlogCode/tree/master/2016/bench_prio
Lucene also uses peek to see if a new entry is competitive, but a further important optimization that we make is that we only do a single rebalance of the binary heap.
Rather than add() followed by poll() (each of which will reestablish heap variants), we update the root (the smallest element that peek saw) and fix up the heap once. This cuts the time for a new insert by a factor of 2.
Unfortunately, we don’t see this capability in standard heap or priority queue implementations and thus maintain our own.
It cuts by a factor of 2 only for small queues. Somewhat unsurprisingly, as you use larger queues, the processing time seems to be dominated by cache misses.
Orhan Ozalpsays:
Have you tried to use a bigger random number margin? It’s going to change priority queue performance because right now, after the first 5000 elements, priority queue values are all between 0 and 5 and this makes it harder to find a smaller value than 5.
I believe that to be fair, you should always go through the comparator function and should write !reverseddefaultcomparator(x, b.peek()) instead of (x < b.peek()).
Also, I implemented a priority queue that remembers a max size and to which you always add(). It's on par with FastPriorityQueue-KWillets-replaceTop, maybe a bit faster.
After having read the Wikipedia article (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap) that states Floyd's faster than Williams', I tried to implement add2() that keeps the heap in a "degraded" state:
– array[0] is always the min element
– I heapify the array once this.size reaches this.maxsize
To my surprise, FastPriorityQueueGP-add2 is much slower than FastPriorityQueueGP-add.
$ node ./test.js
Platform: darwin 16.6.0 x64
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
Node version 8.1.3, v8 version 5.8.283.41
starting dynamic queue/enqueue benchmark
FastPriorityQueue-KWillets-replaceTop x 6,522 ops/sec ±2.40% (87 runs sampled)
FastPriorityQueueGP-add x 7,234 ops/sec ±0.77% (91 runs sampled)
FastPriorityQueueGP-add2 x 4,435 ops/sec ±1.64% (86 runs sampled)
sort x 240 ops/sec ±1.11% (83 runs sampled)
One thing I just noticed is that add/poll can be replaced with swapping the root and percolating down, so that two separate heap adjustments aren’t needed.
//b.add(x);
//b.poll();
b.array[0]=x;
b._percolateDown(0);
This gives another 50% increase in performance.
Unfortunately I’m not seeing that increase on the most recent version of the test, but it still seems like the right way to do it.
And, this heap operation is known as a “replace top” in Postgres; they quote Knuth and Heapsort as the source. It’s required for heapify, but I agree with Yonik that it should be exposed as a method.
Looking at their code now I see that they also do the peek() check. It’s fairly standard.
The Quickselect method seems like it should be faster, since it uses the previous k-th value as the pivot on each new run (I checked), and that should yield the same pruning as the peek() check during the first pass through the data buffer. The overhead is probably in the array storage before the Quickselect call, and the fact that half the array is already pivoted.
Yes, it improves the performance, I have updated the blog post.
https://queue.acm.org/detail.cfm?id=1814327
What about b-heap?
Years before I had performed a benchmark between b-heap and binary heap according to this repository
https://github.com/valyala/gheap, however it had not been proven to be faster than standard std::priority_queue, perhaps it’s because of the implementation details which I did not dive into.
I ran some rather comprehensive tests. Turns out for moderate-size queues, it’s quite hard to beat the binary heap. Perhaps, some years ago caching was a problem, but now it seems to be less of a problem. My tests are available online:
https://github.com/searchivarius/BlogCode/tree/master/2016/bench_prio
Lucene also uses peek to see if a new entry is competitive, but a further important optimization that we make is that we only do a single rebalance of the binary heap.
Rather than add() followed by poll() (each of which will reestablish heap variants), we update the root (the smallest element that peek saw) and fix up the heap once. This cuts the time for a new insert by a factor of 2.
Unfortunately, we don’t see this capability in standard heap or priority queue implementations and thus maintain our own.
I have updated the blog post with a merged add-poll call and the performance is improved.
It cuts by a factor of 2 only for small queues. Somewhat unsurprisingly, as you use larger queues, the processing time seems to be dominated by cache misses.
Have you tried to use a bigger random number margin? It’s going to change priority queue performance because right now, after the first 5000 elements, priority queue values are all between 0 and 5 and this makes it harder to find a smaller value than 5.
Nice series of posts on priority queues. Why does sort has the largest throughput, do I miss something?
Sort has the worst results…
Hello Daniel,
I believe that to be fair, you should always go through the comparator function and should write !reverseddefaultcomparator(x, b.peek()) instead of (x < b.peek()).
Also, I implemented a priority queue that remembers a max size and to which you always add(). It's on par with FastPriorityQueue-KWillets-replaceTop, maybe a bit faster.
After having read the Wikipedia article (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap) that states Floyd's faster than Williams', I tried to implement add2() that keeps the heap in a "degraded" state:
– array[0] is always the min element
– I heapify the array once this.size reaches this.maxsize
To my surprise, FastPriorityQueueGP-add2 is much slower than FastPriorityQueueGP-add.
$ node ./test.js
Platform: darwin 16.6.0 x64
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
Node version 8.1.3, v8 version 5.8.283.41
starting dynamic queue/enqueue benchmark
FastPriorityQueue-KWillets-replaceTop x 6,522 ops/sec ±2.40% (87 runs sampled)
FastPriorityQueueGP-add x 7,234 ops/sec ±0.77% (91 runs sampled)
FastPriorityQueueGP-add2 x 4,435 ops/sec ±1.64% (86 runs sampled)
sort x 240 ops/sec ±1.11% (83 runs sampled)
https://gist.github.com/gpakosz/655b087fc4135580c5025870316c4c6e