Daniel Lemire's blog

, 4 min read

Accelerating Conway’s Game of Life with SIMD instructions

6 thoughts on “Accelerating Conway’s Game of Life with SIMD instructions”

  1. Peter McNeeley says:

    I wonder how lookup tables and block compression would perform in comparison to AVX. (compression as all states are probably not equally likely)

  2. foobar says:

    I wonder how complicated the logic of computing new cell liveness value would be using binary logic – essentially reading cell and its neighborhood in nine (interleaved) AVX registers compromising of 2304 cell values and computing 256 cell liveness updates per step using essentially circuit logic of ANDs, ORs, XORs and such.

    1. foobar says:

      Unsurprisingly somebody has had a look at this already, and apparently 35 logic gates are enough for the task:

      https://www.moria.us/old/3/programs/life/

      Considering consecutive cells have four shared neighbors, this could be reduced to maybe 30 ANDs/ORs per round of 256 cells. (Further reductions are possible, but not really practical on circuit level.) Modern CPUs can do three such operations per clock cycle, and if register pressure doesn’t turn out to be a problem and overhead related to edges of vectors could be handled, this ineffective-sounding approach might be almost tenfold times faster than vectorized variant in the blog post!

      But this is a back-of-an-envelope estimate without any code written for the task.

  3. Peter Turney says:

    Golly uses a hashlife algorithm. I’ve been using it lately and finding it very useful. It can be astoundingly fast for some patterns.

    https://en.wikipedia.org/wiki/Golly_(program)
    http://golly.sourceforge.net/
    http://conwaylife.com/w/index.php?title=Golly

  4. I would guess that representing each row as bits in a SIMD register would yield good results, especially because we can (with a little adjustment) reuse the 3×1 sum 3 times (for the row above, below and the row itself).

    1. Charly C says:

      I did just that in this gist: https://gist.github.com/CharCoding/52fb584fab2d3632fe2225880890463e
      It’s a 256×256 board that wraps around, and the board is printed as braille pixels.