18th November 2021, 8 min read

Converting integers to fix-digit representations quickly

11 thoughts on “Converting integers to fix-digit representations quickly”

Alex says:

November 18, 2021 at 8:11 pm

uint64_t bottombottom = top % 10000;
it’s supposed to be
uint64_t bottombottom = bottom % 10000;
1. Daniel Lemire says:
  
  November 18, 2021 at 8:19 pm
  
  Fixed.
George Spelvin says:

November 19, 2021 at 6:32 pm

Another nifty technique which allows 64-bit conversion without 64-bit arithmetic is Douglas W. Jones’ technique described at http://homepage.divms.uiowa.edu/~jones/bcd/decimal.html and implemented in e.g. https://elixir.bootlin.com/linux/latest/source/lib/vsprintf.c#L325.

The Linux code also implements division-by-constant using multiplication manually. While most compilers these days know how to optimize divide by constant to a multiply and shift, they usually can’t infer the limited ranges of the inputs which allows smaller multipliers and no fixups.
aqrit says:

November 19, 2021 at 9:33 pm

16-bit numbers need only one multiply per digit?
void lulz_atoi(char* str, uint16_t val) { uint64_t lo = val; uint64_t hi;
__uint128_t x = (__uint128_t)lo * ((0xFFFFFFFFFFFFFFFFULL / 10000) + 1); hi = x >> 64; lo = (uint64_t)x; str[0] = hi + 0x30; for (int i = 1; i > 64; lo = (uint64_t)x;
str[i] = hi + 0x30; } str[5] = 0; }
1. aqrit says:
  
  November 19, 2021 at 9:39 pm
  
  https://gist.github.com/aqrit/2997713a00ad043b4bac42e342294259
  1. aqrit says:
    
    November 21, 2021 at 3:32 pm
    
    Updated gist to 64-bits. I’ve not checked the generated assembly. Not benchmarked against the other implementations because a uint64_t should have 20 decimal digits…
Lance Richardson says:

November 23, 2021 at 9:59 pm

ICX AVX2 numbers look pretty nice, although “-march=native” was needed to get all three SIMD versions.

# make
c++ -O3 -march=native -Wall -Wextra -std=c++17 -o convert convert.cpp
# ./convert
khuong 7.15067
backlinear 30.8381
linear 21.6466
tree 14.2078
treetst 10.0964
treest 6.32523
treebt 2.30575
sse2 4.81692
sse2(2) 4.70681
avx2 2.00375

khuong 7.15235
backlinear 30.8286
linear 21.6496
tree 14.2603
treetst 10.0969
treest 6.32516
treebt 2.30584
sse2 4.81617
sse2(2) 4.70639
avx2 2.00354

khuong 7.15085
backlinear 30.8319
linear 21.6403
tree 14.2665
treetst 10.0935
treest 6.32525
treebt 2.30579
sse2 4.81627
sse2(2) 4.70653
avx2 2.00359
Lance Richardson says:

November 23, 2021 at 10:08 pm

ICX AVX2 numbers look pretty nice, although “-march=native” was needed to get all three SIMD versions.

# make
c++ -O3 -march=native -Wall -Wextra -std=c++17 -o convert convert.cpp
# ./convert
khuong 7.15067
backlinear 30.8381
linear 21.6466
tree 14.2078
treetst 10.0964
treest 6.32523
treebt 2.30575
sse2 4.81692
sse2(2) 4.70681
avx2 2.00375

khuong 7.15235
backlinear 30.8286
linear 21.6496
tree 14.2603
treetst 10.0969
treest 6.32516
treebt 2.30584
sse2 4.81617
sse2(2) 4.70639
avx2 2.00354

khuong 7.15085
backlinear 30.8319
linear 21.6403
tree 14.2665
treetst 10.0935
treest 6.32525
treebt 2.30579
sse2 4.81627
sse2(2) 4.70653
avx2 2.00359
Ankit Dixit says:

November 25, 2021 at 7:16 am

Can we run this type of coding syntax on an online compiler like this https://www.interviewbit.com/online-java-compiler/
sasuke420 says:

July 23, 2022 at 3:45 pm

Are approaches that only work for streams of many integers of interest? I’m calculating a base-10 checksum of all 10 digits of a u32 in <1ns per u32 with AVX2. This includes an intermediate step in which the length of the encoded string (without leading zeros) is calculated, but of course it does not include outputting the string.

The layout of the digits within the vectors is pretty bad for conversion-to-string purposes: the digits for a single integer are spread out between 2 adjacent bytes of 5 different vectors. It seems like it would be not so expensive to fix the layout to be friendlier for outputting, especially if the task was just to output u32 of at most 8 digits or u64 of at most 16 digits. I don't know a clever way to output u64 of 20 digits though, I would just have to add another scalar modmul at the beginning to split the u64 into groups of (4,8,8) digit.
1. sasuke420 says:
  
  July 23, 2022 at 4:16 pm
  
  To compute the length, if you imagine you have the groups of 4 digits in 4 vectors, you can take max(andnot(digits == 0, digitcount))+1 to get the length where digitcount is a constant:
  [3,2,1,0]
  [7,6,5,4]
  [11,10,9,8]
  [15,14,13,12]
  and max() is actually a sequence of 3 max, shuffle, max, shuffle, max. After that you can use arithmetic to calculate the proper shuffle to move the digits into the front of the vector so you can output a single formatted integer. You end up with a bunch of data dependencies but no branches. The data dependencies will surely rain on your parade though.