27th July 2023, 3 min read

Decoding base16 sequences quickly

aqrit says:

July 27, 2023 at 7:43 pm

Geoff Langdale’s implementation was likely meant to be SSE2 compatible, whereas vectorized table lookups require SSSE3.
1. Daniel Lemire says:
  
  July 27, 2023 at 9:28 pm
  
  You can find the implementation there:
  https://github.com/WojciechMula/toys/blob/master/simd-parse-hex/geoff_algorithm.cpp
sasuke420 says:

August 5, 2023 at 6:10 pm

for my current solution to this sort of problem at https://highload.fun/ I am using this sequence

const u8x32 pack_odd = _mm256_setr_epi8( 15, 13, 11, 9, 7, 5, 3, 1, 15, 13, 11, 9, 7, 5, 3, 1, 15, 13, 11, 9, 7, 5, 3, 1, 15, 13, 11, 9, 7, 5, 3, 1); .... const u8x32 f_0 = _mm256_slli_epi16(e_0, 12); const u8x32 g_0 = _mm256_or_si256(f_0, e_0); const u8x32 h_0 = _mm256_shuffle_epi8(g_0, pack_odd);

rather than something like

__m128i t3 = _mm_maddubs_epi16(v, _mm_set1_epi16(0x0110)); __m128i t5 = _mm_packus_epi16(t3, t3);

I’ll have to try that out. The docs say I’ll suffer some latency loss, but it could still be a win.
1. sasuke420 says:
  
  August 5, 2023 at 6:12 pm
  
  Well, now that I look at what I’ve posted it looks like I am packing and bswapping at the same time, so I would need the shuffle anyway.