Daniel Lemire's blog

, 2 min read

The number of comparisons needed to sort a shuffled array: qsort versus std::sort

Given an array of N numbers of type double, the standard way to sort it in C is to invoke the qsort function

qsort(array, N, sizeof(double), compare);

where compare is a function which returns an integer less than, equal to, or greater than zero if the first argument is less than, equal to, or greater than the second. Because it is a C function, it takes in void pointers which we must convert back to actual values. The following is a reasonable implementation of a comparison function:

int compare(const void *a, const void *b) {
  double x = *(double *)a;
  double y = *(double *)b;
  counter++;
  if (x < y) {
    return -1;
  }
  if (x == y) {
    return 0;
  }
  return 1;
}

Though the function appears to have branches, optimizing compilers can generate binary code without any jumps in this case.

Though the name suggests that qsort might be implemented using the textbook algorithm Quicksort, the actual implementation depends on the standard library.

The standard approach in C++ is similar. The code might look as follows:

    std::sort(array, array + N, compare);

Again, we have a compare function. In C++, the compare function can refer directly to the type, since the std::sort is actually a template function, and not a mere function. It makes that the C++ compiler effectively generates a function for each comparison function you provide. It trades an increase in binary size for a potential increase in performance. A reasonable implementation of the comparison function is as follows:

bool compare(const double x, const double y) {
  return x < y;
}

The signature of the C++ comparison function is different: we return a Boolean value as opposed to a three-class integer value.

An interesting question is how many comparisons each function makes. Typically, the comparisons are inexpensive and a bad predictor of the performance, but you can imagine cases where comparing your values could be expensive.

The exact number of comparisons depends on the underlying implementation provided by your system. As inputs, I use random arrays.

I choose to count the average number of times that the comparison function is called. Experimentally, I find that the C++ function makes many more calls to the comparison function than the C function (qsort). The C library that comes with GCC (glibc) uses k – 1.2645 comparisons per element on average to sort arrays of size 2k, matching the theoretical average case performance of merge sort.

LLVM 13 (Apple):

N calls to the comparison function (qsort) per input value calls to the comparison function (std::sort) per input value
210 10.04 12.26
211 11.13 13.41
212 12.21 14.54

GCC 9:

N calls to the comparison function (qsort) per input value calls to the comparison function (std::sort) per input value
210 8.74 11.98
211 9.74 13.22
212 10.74 14.40

My code is available.

Further reading.