Daniel Lemire's blog

, 5 min read

How expensive is it to parse numbers from a string in C++?

8 thoughts on “How expensive is it to parse numbers from a string in C++?”

  1. me says:

    Can you also compute cycles/char?

    As you can see, ints take about 3x as many cycles, but the throughput is 20x worse, likely because they also use many more chars.

  2. Christopher Chang says:

    Yes, this can be a major bottleneck.

    It’s not for every application (since it doesn’t guarantee the 16th decimal place is preserved), but I’ve found ScanadvDouble() in https://github.com/chrchang/plink-ng/blob/master/2.0/plink2_string.cc to be very useful.

  3. Jan Marquardt says:

    This talk of the CppCon 2019 mentions C++17 std::from_chars being much faster:

    https://youtu.be/4P_kbF0EbZM

    Indeed, cppreference has this sentence: „This is intended to allow the fastest possible implementation that is useful in common high-throughput contexts such as text-based interchange (JSON or XML).“

    Maybe you‘d be interested in benchmarking this against your current implementation?

  4. Maxim Egorushkin says:
  5. foobar says:

    I wonder if there would be a measurable benefit from simply breaking the dependency chain on sum variable, that is using multiple variables instead of one. After all, results with integers would be the same.

    There are certainly faster methods to parse integers than one that takes 18 cycles per byte, at least if you can vectorise! Parsing series of integers with SIMD (Wojciech Muła) finds out that 3-6 cycles per byte is entirely plausible to reach on smaller (shorter) integers.

  6. Virgo says:

    But how about the standard library’s stoi atoi thingies?

    I use them in my Value::deserialize() method.

  7. Andrew Nelless says:

    Look at Boost Spirit. Spirit X3 has essentially optimal (scalar) parsing of numeric strings at -O2 and will actually check for things like over/under flow and narrowing.

    Facebook’s Andrei Alexandrescu also gave a talk on speeding this up further (it’s probably implemented in Folly) by looking at the CPU pipeline and breaking dependencies.

  8. Andrew Nelless says:

    Additionally, it’s worth mentioning that there’s always going to be a divide between functions that respect the users locale , and parse strings like “3,786” and ones that don’t. Iostreams very much do handle this, which makes them inappropriate in general for parsing file formats where a grammar is known.