Daniel Lemire's blog

, 10 min read

Number Parsing at a Gigabyte per Second

15 thoughts on “Number Parsing at a Gigabyte per Second”

  1. Idiot says:
    1. @Idiot

      My statement is…

      JavaScript represents all its numbers, by default, with a 64-bit binary floating-point number type.

      The link you offer supports this statement because it says that we can represent integers exactly up to 2^53, which is what happens under the IEEE binary64 type.

      1. @Idiot

        The link you offer does confirm my statement, please check.

  2. Alice Ryhl says:

    Is there any work on replacing the functions for this in the language’s respective standard libraries? The Rust implementation seems like it would be a good fit for the Rust standard library, with it having no dependencies and no_std support.

    1. Is there any work on replacing the functions for this in the language’s respective standard libraries?

      It is part of Go as of the latest version.

  3. Suminda Sirinath Salpitikorala Dharmasena says:

    I would like to help out on the Java port but I my requirement is that I can go round trip without having different cache table for float/double to string and string to float/double.

    At the moment I am trying to port a DragonBox version (https://github.com/jk-jeon/dragonbox/, https://github.com/jk-jeon/fp/, https://github.com/abolz/Drachennest/) but I am interested in this if it can outperform DragonBox and can go round trip (float/double to string, string to float/double).

    1. We provide exact parsing with round-to-even so “round trip” is not a concern. I have not worked on serialization.

  4. Frank Astier says:

    But, when I have to store e.g. a big matrix of floating point numbers, I would do a copy of that contiguous chunk of memory to disk, and vice-versa, possibly throwing in mmap – precisely to avoid parsing from text?

    1. Right. If you serialize your numbers in binary form, you obviously have no parsing difficulty. In the paper, I also allude to another possibility: you can use hexadecimal floating-point numbers.

  5. What do you feel accounts for the great differences in bandwidth used by each approach? It would be interesting to test these same computations on multiple different CPUs.

    1. The paper does cover different CPUs.

  6. I made my own floating point input/output method (https://reddit.com/r/fpio) How does the performance of r/fpio compare to your benchmark?

  7. My name when spreading silly questions says:

    hello, nice work, but I’m unsure about one point:
    the video title reads ‘w/Perfect Accuracy’, but in the end you state:
    ‘can do exact computation 99,99% of the time’, doe’s that mean:
    ‘in 0.01% of time ( cases ) you see an error and can fall back to other algorithm’, or
    ‘in 0.01% of cases you get a slightly wrong result, learn to live with it’?

    1. It is the former: we fallback if needed.

      1. My name when spreading silly questions says:

        🙂 thank you,