29th January 2021, 10 min read

Number Parsing at a Gigabyte per Second

15 thoughts on “Number Parsing at a Gigabyte per Second”

Idiot says:

January 30, 2021 at 8:25 am

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER

JavaScript is not 64-bit
1. Daniel Lemire says:
  
  March 24, 2021 at 11:31 pm
  
  @Idiot
  
  My statement is…
  
  JavaScript represents all its numbers, by default, with a 64-bit binary floating-point number type.
  
  The link you offer supports this statement because it says that we can represent integers exactly up to 2^53, which is what happens under the IEEE binary64 type.
  1. Daniel Lemire says:
    
    March 24, 2021 at 11:32 pm
    
    @Idiot
    
    The link you offer does confirm my statement, please check.
Alice Ryhl says:

January 30, 2021 at 9:45 am

Is there any work on replacing the functions for this in the language’s respective standard libraries? The Rust implementation seems like it would be a good fit for the Rust standard library, with it having no dependencies and no_std support.
1. Daniel Lemire says:
  
  March 24, 2021 at 9:39 pm
  
  Is there any work on replacing the functions for this in the language’s respective standard libraries?
  
  It is part of Go as of the latest version.
Suminda Sirinath Salpitikorala Dharmasena says:

January 30, 2021 at 10:25 am

I would like to help out on the Java port but I my requirement is that I can go round trip without having different cache table for float/double to string and string to float/double.

At the moment I am trying to port a DragonBox version (https://github.com/jk-jeon/dragonbox/, https://github.com/jk-jeon/fp/, https://github.com/abolz/Drachennest/) but I am interested in this if it can outperform DragonBox and can go round trip (float/double to string, string to float/double).
1. Daniel Lemire says:
  
  March 24, 2021 at 9:40 pm
  
  We provide exact parsing with round-to-even so “round trip” is not a concern. I have not worked on serialization.
Frank Astier says:

January 30, 2021 at 5:17 pm

But, when I have to store e.g. a big matrix of floating point numbers, I would do a copy of that contiguous chunk of memory to disk, and vice-versa, possibly throwing in mmap – precisely to avoid parsing from text?
1. Daniel Lemire says:
  
  January 30, 2021 at 6:12 pm
  
  Right. If you serialize your numbers in binary form, you obviously have no parsing difficulty. In the paper, I also allude to another possibility: you can use hexadecimal floating-point numbers.
Bret Bernhoft says:

February 5, 2021 at 2:34 am

What do you feel accounts for the great differences in bandwidth used by each approach? It would be interesting to test these same computations on multiple different CPUs.
1. Daniel Lemire says:
  
  February 5, 2021 at 1:58 pm
  
  The paper does cover different CPUs.
Piotr Grochowski says:

June 17, 2022 at 12:47 pm

I made my own floating point input/output method (https://reddit.com/r/fpio) How does the performance of r/fpio compare to your benchmark?
My name when spreading silly questions says:

June 17, 2023 at 9:17 am

hello, nice work, but I’m unsure about one point:
the video title reads ‘w/Perfect Accuracy’, but in the end you state:
‘can do exact computation 99,99% of the time’, doe’s that mean:
‘in 0.01% of time ( cases ) you see an error and can fall back to other algorithm’, or
‘in 0.01% of cases you get a slightly wrong result, learn to live with it’?
1. Daniel Lemire says:
  
  June 17, 2023 at 2:24 pm
  
  It is the former: we fallback if needed.
  1. My name when spreading silly questions says:
    
    June 17, 2023 at 5:31 pm
    
    🙂 thank you,