2nd May 2020, 5 min read

Encoding binary in ASCII very fast

8 thoughts on “Encoding binary in ASCII very fast”

Jeff Epler says:

May 2, 2020 at 8:12 pm

Is the input range actually up to 2⁵⁶, rather than 2⁴⁸ as the comment currently suggests?
1. Daniel Lemire says:
  
  May 2, 2020 at 8:24 pm
  
  Yes. You are correct.
Gil Pedersen says:

May 2, 2020 at 9:07 pm

To encode in four instructions:

uint64_t convert_to_ascii(uint64_t x) { return ((0x2040810204081 * (x & 0x80808080808080)) & 0xff80808080808080) ^ x; }
Travis Downs says:

May 15, 2020 at 11:44 pm

If you are willing to use a bigger chunk size, you can probably do it faster with something like:

uint64_t encode(uint64_t bits, uint64_t& high) { high = (high >> 1) | (bits & 0x8080808080808080) ; return bits & 0x7F7F7F7F7F7F7F7F; }
// encode a 64-byte input chunk into 72 output bytes uint64_t high = 0; for (size_t i = 0; i < 8; i++) result[0] = encode(input[0], high); result[8] = high;

By my count this is fewer operations and all basic ops: no potentially expensive multiply. Also, it processes 8 bytes at a time, not 7, which keeps all accesses aligned (this matters much more on non-x86).

On x86, you could take advantage of LEA, which can do (bits << 1) + x in a single instruction, but because you’re shifting left, not right, I think you can only process 7 bytes at a time (like your suggested algorithm), losing some benefit. Still, even at 7 bytes i calculate this as fewer total ops.
1. randomPoster says:
  
  May 18, 2020 at 7:06 am
  
  You must stop your loop one step before, and encode 56bytes in 64bytes, otherwise result[8] will not fit in ascii.
  1. Travis Downs says:
    
    May 19, 2020 at 12:02 am
    
    Yes, that’s right: it should only be 7 iterations of the inner loop before storing the upper bits.
Travis Downs says:

May 15, 2020 at 11:58 pm

The formatter wrecks the indentation in the code samples, but that should be:

for (size_t i = 0; i < 8; i++) result[i] = encode(input[i], high); result[8] = high;
Steven Stewart-Gallus says:

May 23, 2020 at 1:01 am

Less instructions is not always faster.
Not sure what the best way to exploit ILP is though.