Daniel Lemire's blog

, 1 min read

Are vectorized random number generators actually useful?

Our processors benefit from “SIMD” instructions. These instructions can operate on several values at once, thus greatly accelerating some algorithms. Earlier, I reported that you can multiply the speed of common (fast) random number generators such as PCG and xorshift128+ by a factor of three or four by vectorizing them using SIMD instructions.

A reader challenged me: is this actually useful in practice?

There are problems where the generation of random number is critical to the performance. That is the case in many simulations, for example. The simplest and best known one is probably the random shuffling of arrays. The standard algorithm is quite simple:

for (i = size; i > 1; i--) {
  var j = random_number(i);
  switch_values(array, i, j);
}

If you are interested, O’Neill wrote a whole blog post of this specific problem.

So can I accelerate the shuffling of an array using SIMD instructions?

So I threw together a vectorized shuffle and a regular (scalar) shuffle, both of them using O’Neill’s PCG32. The net result? I almost double the speed using SIMD instructions when the array is in cache:

SIMD shuffle 3.5 cycles per entry
scalar shuffle 6.6 cycles per entry

My code is available. I do not do anything sophisticated so I expect it is possible to do a lot better. My sole goal was to demonstrate that SIMD random number generators are practical.