Daniel Lemire's blog

, 9 min read

Top speed for top-k queries

12 thoughts on “Top speed for top-k queries”

  1. Kendall Willets says:

    One thing I just noticed is that add/poll can be replaced with swapping the root and percolating down, so that two separate heap adjustments aren’t needed.

    //b.add(x);
    //b.poll();
    b.array[0]=x;
    b._percolateDown(0);

    This gives another 50% increase in performance.

    1. KWillets says:

      Unfortunately I’m not seeing that increase on the most recent version of the test, but it still seems like the right way to do it.

      And, this heap operation is known as a “replace top” in Postgres; they quote Knuth and Heapsort as the source. It’s required for heapify, but I agree with Yonik that it should be exposed as a method.

      Looking at their code now I see that they also do the peek() check. It’s fairly standard.

      The Quickselect method seems like it should be faster, since it uses the previous k-th value as the pivot on each new run (I checked), and that should yield the same pruning as the peek() check during the first pass through the data buffer. The overhead is probably in the array storage before the Quickselect call, and the fact that half the array is already pivoted.

    2. Yes, it improves the performance, I have updated the blog post.

  2. Yingfeng says:

    https://queue.acm.org/detail.cfm?id=1814327
    What about b-heap?

    Years before I had performed a benchmark between b-heap and binary heap according to this repository
    https://github.com/valyala/gheap, however it had not been proven to be faster than standard std::priority_queue, perhaps it’s because of the implementation details which I did not dive into.

    1. I ran some rather comprehensive tests. Turns out for moderate-size queues, it’s quite hard to beat the binary heap. Perhaps, some years ago caching was a problem, but now it seems to be less of a problem. My tests are available online:
      https://github.com/searchivarius/BlogCode/tree/master/2016/bench_prio

  3. Yonik Seeley says:

    Lucene also uses peek to see if a new entry is competitive, but a further important optimization that we make is that we only do a single rebalance of the binary heap.

    Rather than add() followed by poll() (each of which will reestablish heap variants), we update the root (the smallest element that peek saw) and fix up the heap once. This cuts the time for a new insert by a factor of 2.

    Unfortunately, we don’t see this capability in standard heap or priority queue implementations and thus maintain our own.

    1. I have updated the blog post with a merged add-poll call and the performance is improved.

    2. It cuts by a factor of 2 only for small queues. Somewhat unsurprisingly, as you use larger queues, the processing time seems to be dominated by cache misses.

  4. Orhan Ozalp says:

    Have you tried to use a bigger random number margin? It’s going to change priority queue performance because right now, after the first 5000 elements, priority queue values are all between 0 and 5 and this makes it harder to find a smaller value than 5.

  5. Nice series of posts on priority queues. Why does sort has the largest throughput, do I miss something?

    1. Sort has the worst results…

  6. Gregory says:

    Hello Daniel,

    I believe that to be fair, you should always go through the comparator function and should write !reverseddefaultcomparator(x, b.peek()) instead of (x < b.peek()).

    Also, I implemented a priority queue that remembers a max size and to which you always add(). It's on par with FastPriorityQueue-KWillets-replaceTop, maybe a bit faster.

    After having read the Wikipedia article (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap) that states Floyd's faster than Williams', I tried to implement add2() that keeps the heap in a "degraded" state:
    – array[0] is always the min element
    – I heapify the array once this.size reaches this.maxsize

    To my surprise, FastPriorityQueueGP-add2 is much slower than FastPriorityQueueGP-add.

    $ node ./test.js
    Platform: darwin 16.6.0 x64
    Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
    Node version 8.1.3, v8 version 5.8.283.41

    starting dynamic queue/enqueue benchmark
    FastPriorityQueue-KWillets-replaceTop x 6,522 ops/sec ±2.40% (87 runs sampled)
    FastPriorityQueueGP-add x 7,234 ops/sec ±0.77% (91 runs sampled)
    FastPriorityQueueGP-add2 x 4,435 ops/sec ±1.64% (86 runs sampled)
    sort x 240 ops/sec ±1.11% (83 runs sampled)

    https://gist.github.com/gpakosz/655b087fc4135580c5025870316c4c6e