Daniel Lemire's blog

, 3 min read

Visiting all values in an array exactly once in “random order”

Suppose that you want to visit all values in an array exactly once in “random order”. You could do it by shuffling your array but it requires some extra storage. You want your code to use just a tiny bit of memory, and you want the code to be super fast. You do not want to assume that your array size is a power of two. One way to do it is to use the fact that (a x + b) modulo n will visit all integer values in [0,n) exactly once as x iterates through the integers in [0, n), as long as a is coprime with n. Being coprime just means that the greatest common divisor between a and n is 1. There are fast functions to compute the greatest common divisor between a and n.

A trivial coprime number would be a = 1, but that’s bad for obvious reasons. So we pick a coprime number in [n/2,n) instead. There is always at least one no matter what n is.

Enumerating all coprime numbers in [n/2,n) could get tiresome when n is very large, so maybe we just look at up to 100,000 of them. There is no need to actually store them in memory, we can just select one at random, so it requires very little memory.

To see why the mathematics work, suppose that ( a x + b ) modulo n = ( a x' + b ) modulo n, then a (x - x') modulo n = 0 which only happens when (x - x') is a multiple of n because a and n are coprime. Thus if you map a consecutive range of n values x, you will get n distinct values ( a x + b ) modulo n. The choice of the parameter a is critical however: if you set a to 1 or 2, even if it is coprime with n, the result will not look random.

The running code is ridiculously simple:


    public int getCurrentValue() {
      return ( (long) index * prime + offset ) % ( maxrange);
    }

    public boolean hasNext() {
      return index < maxrange;
    }

    public int next() {
      int answer = getCurrentValue();
      index ++;
      return answer;
    }

You can optimize this code by avoiding multiplications and remainder computations:

public int next() {
      runningvalue += prime;
      if(runningvalue >= maxrange) runningvalue -= maxrange;
      index ++;
      // runningvalue == getCurrentValue()) 
      return runningvalue;
}

Of course, it is not really random in the sense that no (good) statistician should accept the result as a fair shuffle of the indexes. Still, it might be “good enough” to fool your colleagues into thinking that it is random.

While my implementation assumes that you are visiting the values in order, you can go back in time, or jump forward and backward arbitrarily.

I make my Java code available. It can be made more elegant, but it should work just fine in your projects.

(As pointed out by Leonid Boytsov, this approach is reminiscent of the Linear congruential generators that are used to produce random numbers.)

If you can find ways to make the result “look” more random without significantly making it slower and without increasing memory usage, please let us know.

You can find ready-made solutions to visit all values in an array with a power of two number of elements. And by restricting your traversal to the subset of elements in [0,n) from a larger virtual array having a power of two size, you will have an alternative to the approach I describe, with the caveat that your main code will require branching. The computational complexity of a call to “next” becomes O(n) whereas I use a small, finite, number of instructions.

Follow-up: Benchmarking algorithms to visit all values in an array in random order