22nd February 2021, 4 min read

Parsing floating-point numbers really fast in C#

4 thoughts on “Parsing floating-point numbers really fast in C#”

soywiz says:

February 28, 2021 at 3:14 pm

Hey Daniel, as always, really nice work!
Since C# is OpenSource, and your approach accurate, have you considered making a PR into C# runtime so everyone can benefit from that?

https://github.com/dotnet/runtime/blob/308ae6ad833089199b8afbf30a7b402f35190fc8/src/libraries/System.Private.CoreLib/src/System/Double.cs#L284
1. Daniel Lemire says:
  
  March 1, 2021 at 4:26 pm
  
  An issue has been opened. Meanwhile, we are working hard to make the library as usable as possible.
Joe Duarte says:

April 10, 2021 at 1:02 am

Hi Daniel, a couple of questions. I was just about to ask what data format you were using for some of the integer libraries when I realized that a lot of these are parsing text files.

So when you say “parsing” floats or integers, should I understand that this means parsing a text representation of these values? Is that implied in the term “parse”, such that we wouldn’t say we were parsing if the data was binary?

And then with these floats, I noticed the data files have a lot of content before the floats themselves. In many of the files, there are lots of leading zeros before each number (more than eight). What are those about? And then some files have a bunch of hex before each number, like this file:

https://github.com/CarlVerret/csFastFloat/blob/master/TestcsFastFloat/data_files/tencent-rapidjson.txt

What is that hex data? Are those supposed to be floats also? It seems like the floats come at the end of each row, after a lot of hex. An easier example is this one:

58A8 43150000 4062A00000000000 149

The float is 149, and the two longer strings in the middle are different hex representations of 149 as a float. But I don’t know what 58A8 is. Is csFastFloat doing anything with those hex strings? Which representation is actually parsing?
1. Daniel Lemire says:
  
  April 10, 2021 at 1:46 am
  
  We parse strings representing numbers in decimal form.
  
  These files you are looking at are test files for internal use, and not part of the library. We use them for testing.