Daniel Lemire's blog

, 11 min read

Computer scientists need to learn about significant digits

16 thoughts on “Computer scientists need to learn about significant digits”

  1. Nick Barnes says:

    Yes, but: much of computer science is not a natural science, or anything much like one; it’s a branch of mathematics. How many significant digits does Ï€ have? If some algorithm, on some input, takes (say) 78234 tree-rebalancing operations, then that’s the number it takes. Not plus or minus anything. There’s no measurement error, there’s no experimental error. Should Vassilevska Williams state that her matrix multiplication algorithm has an asymptotic cost of O(n^ two and a bit) ?

    Where there are sources of error or variation, for instance in time and space measurements of running systems, particularly of multi-processing environments, particularly of systems connected to instruments or other external interfaces such as UI devices, then I quite agree, the error and variation should be quantified and numbers given to appropriate numbers of significant digits, and often compsci papers fail to do this well enough.

    Having said that, the sources of variation may often be controllable, and with care the resulting precision may be greater than many physical scientists could normally achieve. I have personally worked on real running systems which have space measurements reproducible to six or more decimal places, and time measurements reproducible to five or more. If I have that many significant digits, should I state them? My habit was generally to give the full precision in tables but to truncate in running text, for rhetorical purposes.

  2. A. Non says:

    Providing additional precision can help convince the reader that you didn’t just make the number up.

  3. @A. Non

    Because it is a lot harder to make up the number 304.03 than the number 300?

  4. Federico says:

    Of course, it is 56.137% harder to make up.

    Ok, now back to the serious things…

    Going one step further, I would suggest replacing the numbers with charts whenever possible.

  5. lylebot says:

    My students routinely turn in work with numbers reported to 15(!) significant digits. Drives me nuts.

  6. Neil Conway says:

    Saying the program’s runtime is 300 seconds rather than 304.03 seconds is only a slight improvement. Much better would be to say “the mean of k runs was x seconds with variance y”, for example.

  7. Bob says:

    How does this request to express experimental results using fewer digits, goes hand in hand with making papers longer?

  8. @Nick

    There are good reasons sometimes to go beyond 2 significant digits. But doing so without a good reason makes your article and your tables harder to parse.

  9. Peter Norvig says:

    Sometimes when we say “33.14 MB” the purpose is not to answer “is this significantly different from “30 MB” but rather (or also) is this identical to the other file over there. To test identity, all digits are significant.

  10. Alan says:

    33.14 might be significant because it can help determine whether a source of error was the result of overflow or other oddities. Also there is no error, it’s not like chemistry where we don’t know It’s more like math where we objectively know. It would be like deriding mathematicians for not following the rules on significant digits.

  11. Anonymous says:

    I totally agree…

    Sometimes one can keep digits just out of lazyness, since they are the ouput of a program, copied and pasted in the paper.

  12. Wells says:

    I don’t think you know what you are talking about.

    A significant digit is a digit that you actually measured. If you have in fact measured every byte of a file (And you should be able to.) Then you can report the size of the file to the nearest byte, regardless of if it is a kilobyte file, a megabyte file, a gigabyte file or a terabyte file.

    Significant figures come into play when you have precision that your instruments cannot actually measure to. Say you are timing a process and your clock is accurate to the nearest second (Like with a UNIX timestamp or something). If this is true. giving a mean with any numbers after the decimal point is inaccurate.

  13. J. Pooh says:

    I was taught about significant figures in Grade 10, circa mid-1970s. The rules are about 90% well defined and objective, and about 10% less well defined and subjective.

    Context also matters. For example, if the project requires that a software module must execute start to finish in not more than 304.00 seconds, then 304.03 seconds is probably a fail. But ideally one would measure it to a precision at least six to ten times the margin. Most times excess significant figures is nonsense, not context.

    “100 km/h” speed limit might actually be ±10 km/h. Sign should read “100.0 km/h”. LOL.

  14. @Wells

    The number of significant digits you report is bounded by what you actually measured, but scientists typically report fewer digits, for reasons such as the ones I report.

  15. Billy Ethridge says:

    Logically, using an in-significant digit as if it were significant is “the fallacy of misplaced precision”.

    Context is everything. If I ask you how many miles per gallon you get with your new hybrid car and you say “50.52841926 mpg”, every digit after “50.” is likely to be insignificant. The insignificant digits are not meaningfully informative.

  16. I made a long-term enemy by publically telling a CalTech PHD that he could not publish a measured performance number to eight significant digits (when two digits was dubious). Also the zero-intercept on the graph could not be at (0,0). First-year Physics students are meant to learn the basic stuff, but apparently not a CalTech PHD.