Daniel Lemire's blog

, 8 min read

Converting integers to fix-digit representations quickly

11 thoughts on “Converting integers to fix-digit representations quickly”

  1. Alex says:

    uint64_t bottombottom = top % 10000;
    it’s supposed to be
    uint64_t bottombottom = bottom % 10000;

    1. Fixed.

  2. George Spelvin says:

    Another nifty technique which allows 64-bit conversion without 64-bit arithmetic is Douglas W. Jones’ technique described at http://homepage.divms.uiowa.edu/~jones/bcd/decimal.html and implemented in e.g. https://elixir.bootlin.com/linux/latest/source/lib/vsprintf.c#L325.

    The Linux code also implements division-by-constant using multiplication manually. While most compilers these days know how to optimize divide by constant to a multiply and shift, they usually can’t infer the limited ranges of the inputs which allows smaller multipliers and no fixups.

  3. aqrit says:

    16-bit numbers need only one multiply per digit?

    void lulz_atoi(char* str, uint16_t val) {
    uint64_t lo = val;
    uint64_t hi;

    __uint128_t x = (__uint128_t)lo * ((0xFFFFFFFFFFFFFFFFULL / 10000) + 1);
    hi = x >> 64;
    lo = (uint64_t)x;

    str[0] = hi + 0x30;
    for (int i = 1; i > 64;
    lo = (uint64_t)x;

    str[i] = hi + 0x30;
    }
    str[5] = 0;
    }

    1. aqrit says:
      1. aqrit says:

        Updated gist to 64-bits. I’ve not checked the generated assembly. Not benchmarked against the other implementations because a uint64_t should have 20 decimal digits…

  4. Lance Richardson says:

    ICX AVX2 numbers look pretty nice, although “-march=native” was needed to get all three SIMD versions.

    # make
    c++ -O3 -march=native -Wall -Wextra -std=c++17 -o convert convert.cpp
    # ./convert
    khuong 7.15067
    backlinear 30.8381
    linear 21.6466
    tree 14.2078
    treetst 10.0964
    treest 6.32523
    treebt 2.30575
    sse2 4.81692
    sse2(2) 4.70681
    avx2 2.00375

    khuong 7.15235
    backlinear 30.8286
    linear 21.6496
    tree 14.2603
    treetst 10.0969
    treest 6.32516
    treebt 2.30584
    sse2 4.81617
    sse2(2) 4.70639
    avx2 2.00354

    khuong 7.15085
    backlinear 30.8319
    linear 21.6403
    tree 14.2665
    treetst 10.0935
    treest 6.32525
    treebt 2.30579
    sse2 4.81627
    sse2(2) 4.70653
    avx2 2.00359

  5. Lance Richardson says:

    ICX AVX2 numbers look pretty nice, although “-march=native” was needed to get all three SIMD versions.

    # make
    c++ -O3 -march=native -Wall -Wextra -std=c++17 -o convert convert.cpp
    # ./convert
    khuong 7.15067
    backlinear 30.8381
    linear 21.6466
    tree 14.2078
    treetst 10.0964
    treest 6.32523
    treebt 2.30575
    sse2 4.81692
    sse2(2) 4.70681
    avx2 2.00375

    khuong 7.15235
    backlinear 30.8286
    linear 21.6496
    tree 14.2603
    treetst 10.0969
    treest 6.32516
    treebt 2.30584
    sse2 4.81617
    sse2(2) 4.70639
    avx2 2.00354

    khuong 7.15085
    backlinear 30.8319
    linear 21.6403
    tree 14.2665
    treetst 10.0935
    treest 6.32525
    treebt 2.30579
    sse2 4.81627
    sse2(2) 4.70653
    avx2 2.00359

  6. Ankit Dixit says:

    Can we run this type of coding syntax on an online compiler like this https://www.interviewbit.com/online-java-compiler/

  7. sasuke420 says:

    Are approaches that only work for streams of many integers of interest? I’m calculating a base-10 checksum of all 10 digits of a u32 in <1ns per u32 with AVX2. This includes an intermediate step in which the length of the encoded string (without leading zeros) is calculated, but of course it does not include outputting the string.

    The layout of the digits within the vectors is pretty bad for conversion-to-string purposes: the digits for a single integer are spread out between 2 adjacent bytes of 5 different vectors. It seems like it would be not so expensive to fix the layout to be friendlier for outputting, especially if the task was just to output u32 of at most 8 digits or u64 of at most 16 digits. I don't know a clever way to output u64 of 20 digits though, I would just have to add another scalar modmul at the beginning to split the u64 into groups of (4,8,8) digit.

    1. sasuke420 says:

      To compute the length, if you imagine you have the groups of 4 digits in 4 vectors, you can take max(andnot(digits == 0, digitcount))+1 to get the length where digitcount is a constant:
      [3,2,1,0]
      [7,6,5,4]
      [11,10,9,8]
      [15,14,13,12]
      and max() is actually a sequence of 3 max, shuffle, max, shuffle, max. After that you can use arithmetic to calculate the proper shuffle to move the digits into the front of the vector so you can output a single formatted integer. You end up with a bunch of data dependencies but no branches. The data dependencies will surely rain on your parade though.