28th September 2016, 1 min read

Sorting already sorted arrays is much faster?

If you are reading a random textbook on computer science, it is probably going to tell you all about how good sorting algorithms take linearithmic time. To arrive at this result, they count the number of operations. That’s a good model to teach computer science, but working programmers need more sophisticated models of software performance.

On modern superscalar processors, we expect in-memory sorting to limited by how far ahead the processor can predict where the data will go. Though moving the data in memory is not free, it is a small cost if it can be done predictably.

We know that sorting “already sorted data” can be done in an easy-to-predict manner (just do nothing). So it should be fast. But how much faster is it that sorting randomly shuffled data?

I decided to run an experiment. I use arrays containing one million distinct 32-bit integers, and I report the time in CPU cycles per value on a Haswell processor. I wrote my code in C++.

function	sorted data	shuffled data	sorted in reverse
`std::sort`	38	200	30

For comparison, it takes roughly n log(n) comparisons to sort an array of size n in the worst case with a good algorithm. In my experiment, log(n) is about 20.

The numbers bear out our analysis. Sorting an already-sorted array takes a fraction of the time needed to sort a shuffled array. One could object that the reason sorting already-sorted arrays is fast is because we do not have to move the data so much. So I also included initial arrays that were sorted in reverse. Interestingly, std::sort is even faster with reversed arrays! This is clear evidence for our thesis.

(The C++ source code is available. My software includes timsort results if you are interested.)