21st June 2017, 9 min read

Top speed for top-k queries

12 thoughts on “Top speed for top-k queries”

Kendall Willets says:

June 21, 2017 at 10:03 pm

One thing I just noticed is that add/poll can be replaced with swapping the root and percolating down, so that two separate heap adjustments aren’t needed.

//b.add(x);
//b.poll();
b.array[0]=x;
b._percolateDown(0);

This gives another 50% increase in performance.
1. KWillets says:
  
  June 22, 2017 at 5:22 pm
  
  Unfortunately I’m not seeing that increase on the most recent version of the test, but it still seems like the right way to do it.
  
  And, this heap operation is known as a “replace top” in Postgres; they quote Knuth and Heapsort as the source. It’s required for heapify, but I agree with Yonik that it should be exposed as a method.
  
  Looking at their code now I see that they also do the peek() check. It’s fairly standard.
  
  The Quickselect method seems like it should be faster, since it uses the previous k-th value as the pivot on each new run (I checked), and that should yield the same pruning as the peek() check during the first pass through the data buffer. The overhead is probably in the array storage before the Quickselect call, and the fact that half the array is already pivoted.
2. Daniel Lemire says:
  
  June 22, 2017 at 5:49 pm
  
  Yes, it improves the performance, I have updated the blog post.
Yingfeng says:

June 22, 2017 at 12:56 am

https://queue.acm.org/detail.cfm?id=1814327
What about b-heap?

Years before I had performed a benchmark between b-heap and binary heap according to this repository
https://github.com/valyala/gheap, however it had not been proven to be faster than standard std::priority_queue, perhaps it’s because of the implementation details which I did not dive into.
1. Leonid Boytsov says:
  
  June 26, 2017 at 4:09 pm
  
  I ran some rather comprehensive tests. Turns out for moderate-size queues, it’s quite hard to beat the binary heap. Perhaps, some years ago caching was a problem, but now it seems to be less of a problem. My tests are available online:
  https://github.com/searchivarius/BlogCode/tree/master/2016/bench_prio
Yonik Seeley says:

June 22, 2017 at 1:13 pm

Lucene also uses peek to see if a new entry is competitive, but a further important optimization that we make is that we only do a single rebalance of the binary heap.

Rather than add() followed by poll() (each of which will reestablish heap variants), we update the root (the smallest element that peek saw) and fix up the heap once. This cuts the time for a new insert by a factor of 2.

Unfortunately, we don’t see this capability in standard heap or priority queue implementations and thus maintain our own.
1. Daniel Lemire says:
  
  June 22, 2017 at 5:49 pm
  
  I have updated the blog post with a merged add-poll call and the performance is improved.
2. Leonid Boytsov says:
  
  June 26, 2017 at 4:11 pm
  
  It cuts by a factor of 2 only for small queues. Somewhat unsurprisingly, as you use larger queues, the processing time seems to be dominated by cache misses.
Orhan Ozalp says:

June 22, 2017 at 8:57 pm

Have you tried to use a bigger random number margin? It’s going to change priority queue performance because right now, after the first 5000 elements, priority queue values are all between 0 and 5 and this makes it harder to find a smaller value than 5.
Leonid Boytsov says:

June 26, 2017 at 4:12 pm

Nice series of posts on priority queues. Why does sort has the largest throughput, do I miss something?
1. Daniel Lemire says:
  
  June 26, 2017 at 6:41 pm
  
  Sort has the worst results…
Gregory says:

July 18, 2017 at 1:37 pm

Hello Daniel,

I believe that to be fair, you should always go through the comparator function and should write !reverseddefaultcomparator(x, b.peek()) instead of (x < b.peek()).

Also, I implemented a priority queue that remembers a max size and to which you always add(). It's on par with FastPriorityQueue-KWillets-replaceTop, maybe a bit faster.

After having read the Wikipedia article (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap) that states Floyd's faster than Williams', I tried to implement add2() that keeps the heap in a "degraded" state:
– array[0] is always the min element
– I heapify the array once this.size reaches this.maxsize

To my surprise, FastPriorityQueueGP-add2 is much slower than FastPriorityQueueGP-add.

$ node ./test.js
Platform: darwin 16.6.0 x64
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
Node version 8.1.3, v8 version 5.8.283.41

starting dynamic queue/enqueue benchmark
FastPriorityQueue-KWillets-replaceTop x 6,522 ops/sec Â±2.40% (87 runs sampled)
FastPriorityQueueGP-add x 7,234 ops/sec Â±0.77% (91 runs sampled)
FastPriorityQueueGP-add2 x 4,435 ops/sec Â±1.64% (86 runs sampled)
sort x 240 ops/sec Â±1.11% (83 runs sampled)

https://gist.github.com/gpakosz/655b087fc4135580c5025870316c4c6e