Daniel Lemire's blog

, 31 min read

Does software performance still matter?

27 thoughts on “Does software performance still matter?”

  1. I would agree that performance probably only matters in a fraction of all the code written today. Much code is (still) written for batch processes that are run episodically. One can consider a web application to essentially be a batch process.

    Additionally, interpreted languages such as Java, Javascript and Python have been designed in part to reduce programming errors (such as memory leaks) and to make correctness more likely, by a number of routes including trivialising certain tasks with libraries and so on. Even C# takes this route.

    The prevalence of such languages suggests to me that the industry considers the improvements in correctness are worth the trade-off in performance. In many cases, the performance at the back end will make no difference – the bottle neck will be the transmission line.

    In short – performance only really matters where it matters – which is probably in about 1% of the code being written. I have the good fortune to work in such an area – developing embedded code that must do a lot of number crunching on large datasets (geophysical data, mathematical transformations) on relatively slow (battery powered) systems. I write in C++. I interface with folks who write code where performance does not matter very much – they write the UI (in Python using Pygame).

    1. In short – performance only really matters where it matters – which is probably in about 1% of the code being written.

      Possibly even less than 1%. But a lot of code gets written. And this tiny fraction is still considerable especially if you consider that it may require orders of magnitude more time per line of code to write optimized code than to write generic code.

  2. Paul Jurczak says:

    There is another aspect of software performance – power efficiency. Computing devices in the USA use more than 2% of total electrical power generated (http://www.datacenterknowledge.com/archives/2016/06/27/heres-how-much-energy-all-us-data-centers-consume/). We are cooking the planet in order to allow hugely wasteful programming techniques, e.g. virtual machines everywhere (dynamic languages, server virtualization), bloated data formats (HTML, JSON), etc.

    Maybe this is a reasonable tradeoff, but it is far from clear for a member of older generation who still remembers delivering perfectly useful software fitting in 64KB RAM and running on 4MHz CPU. I’m not advocating going back to such drastic resource constraints, but I suggest that 4 orders of magnitude or more increase in computational resources typically available to a person over last couple of decades, did not bring equivalent increase in productivity or any other societal value metric. We are at a point of diminishing returns.

    1. @Paul

      I suggest that 4 orders of magnitude or more increase in computational resources typically available to a person over last couple of decades, did not bring equivalent increase in productivity or any other societal value metric

      The original PC XT ran on a 130-Watt power supply. You can recharge your iPhone using about 10 Watt-hours, and it will then be good for a day of use. So it seems quite clear that, even though an iPhone does infinitely many things, the PC XT used a lot more power.

      We have done a lot of progress. With very little power, we give people access to a world of information. It is easy to believe that yesterday’s software was more efficient, but I doubt it. The COBOL and Basic software from the 1980s was not always exactly very efficient.

      1. Stefano says:

        With very little power, we give people access to a world of information.

        True: present day computers are order of magnitude more power-efficient than the “old-days ones”.

        However mobile applications are backed by huge data centres and internet infrastructure. So when measuring the “power footprint” it is a little misleading to consider only mobile devices, and not the whole eco-system.

        Nevertheless modern data centres are very power efficient: the 2016-11 Green500 list reports a top performance of 9.4 GFlops/W! (And the excellent document linked by Paul Jurczak confirms this point.)

        Modern trends are towards very high energy efficiency: on the hand-held side because we like long battery run times, on the data-center side because nobody wants to pay huge energy bills.

        How will this pattern (diffused dark silicon with very bright hot-spots at data-centres) evolve in the near future? I really have no idea (I’m not good at making predictions.)

  3. Mike Yearwood says:

    I’ve been programming for over 35 years. Users are always grumbling about slowness. Many times I get amazing performance improvements rewriting others’ code. Recently I took a 10 hour process and cut it to 30 mins. The original code was written by a guy well versed in his business and language. All I did was apply my own best practices. The point is people don’t seem to know how to write really fast code the first time. Why teach the insertion sort algorithm at all since no one ever uses it? Ask anyone how to count weekdays between 2 dates. They use a loop and an if. Which even an 8 year old would do. I use 5/7ths of number of days and correct for start and end. The right way of thinking and experience helps pick good practices.

  4. Michael Kellett says:

    The article is written from a very large systems point of view. Most of my work is concerned with single processor systems where low power is key performance requirement. The cost of the processor is frequently important (if you make a lot of widgets then there is a cost attached to16k of code rather than 8k).
    So I spend a fair amount of time carefully tuning for small code size, and fast execution (ie minimizing processor cycles).
    So it comes to what counts as “performance” – it varies according to the application but there are plenty of opportunities where traditional speed and smallness are needed.

    1. The article is written from a very large systems point of view.

      I do write: “computers are asked to do more with less (…) there is pressure to run the same software on smaller and cheaper machines (such as watches)”.

  5. Alasdair Scott says:

    When working with minicomputers in the 1970s prior to the PC, boot times were typically less than one second. Now computers and phones etc. often take up to a minute to boot. One of the problems is the current over reliance on textual configuration data in formats such as XML or JSON, and the heavy associated overheads to parse it all. Most of this data never changes, so it would be good if there could be a switch back to appropriate binary formats – enabling host devices to be ready for use immediately after power-up.

    1. Now computers and phones etc. often take up to a minute to boot.

      As a user, I unconcerned by how long computers take to boot because I never boot them. My computers are always powered though they may be asleep.

      If you have a macOS, Android or iOS computer, you get ‘instant on’. When I lift the lid of my MacBook, the MacBook is already ‘on’. It resumes from sleep almost instantly. All modern smartphones and tablets do this also.

      One of the problems is the current over reliance on textual configuration data in formats such as XML or JSON, and the heavy associated overheads to parse it all.

      I mostly have a problem loading time with Windows PCs. I am not sure why resuming a Windows laptop from sleep should take seconds, but I strongly suspect it has nothing to do with parsing text files.

    2. Bob Kerns says:

      I don’t recall anything except extremely minimal OS booting in les than a second! Disks then we’re extremely slow. Loading the OS image, checking the filesystem integrity, initializing the slow hardware and verifying it, generally took a small number of minutes. A full check of a large filesystem could take hours.

      Sure, RTOS and similar could boot quickly but that is a poor comparison.

      Parsing text configuration files is not where any significant part of boot time goes then or today. Even with fast SSD drives you will spend more time reading programs and data from disk than you will parsing.

      If it were an actual problem, we would simply cache the result and only parse when the data changes. But the benefit would be negated by the need to have configuration tools to inspect and modify the binary files.

      Most configuration files fit in one filesystem cluster, so involve a single disk read, and parse in microseconds.

      Windows has the slowest boot times around. It uses binary configuration in the form of the registry.

      I don’t know why Windows is so slow to boot. It is better than it used to be. It is not so much slower that any one thing stands out. And the answer for my configuration may be different than for yours.

      But the real mystery for me is why it is so slow to shut down!

      1. I don’t think that Windows 10 boots slowly though Windows 7 definitively did.

  6. Eric Texley says:

    “Does software performance still matter?” I ponder, distracting myself from the swirling Eclipse icon that just appears for no apparent reason. “As long as living creatures are mortal, performance matters”….I say to myself as I call Keil.. “You guys still sell a non-eclipse based IDE for ARM, right?”

  7. Alasdair Scott says:

    You clearly are more disciplined about keeping your phone charged than I sometimes am Daniel!

    Sleep modes tend to exacerbate the halting problem and often lead to memory fragmentation, so are not always appropriate.

    Also, consider the case of a server virtual host reboot following a lock-up – not uncommon, regardless of the service provider. For a critical busy website, it can be a tense time waiting to see what damage might have been done.

    A typical Linux boot involves reading and parsing many process configuration files and the time is significant.

    http://serverfault.com/questions/580047/how-much-data-does-linux-read-on-average-boot

    The Windows Registry was a well-meaning attempt to avoid some of this overhead. However, it has not worked out as well as one might have hoped. It is interesting to speculate as to why there has been a preference for text configuration files over a database approach. This returns us to the question: does software performance still matter? As both a (somewhat long in the tooth now) programmer and user, I believe it does and consider lengthy boot-up times one area where it does.

    1. Also, consider the case of a server virtual host reboot following a lock-up – not uncommon, regardless of the service provider. For a critical busy website, it can be a tense time waiting to see what damage might have been done.

      Granted. Booting time is critically important in some cases. You wouldn’t want your alarm system to reboot for 2 minutes after a failure.

      You clearly are more disciplined about keeping your phone charged than I sometimes am Daniel!

      If I pick a device and its battery is empty, the time required to reboot is typically the last of my concerns.

      Sleep modes tend to exacerbate the halting problem and often lead to memory fragmentation, so are not always appropriate.

      My impression is that modern operating systems can deal with memory fragmentation without a reboot.

      It is interesting to speculate as to why there has been a preference for text configuration files over a database approach.

      The reason is quite clear I think, and closely related to the reason the web is still text driven. It is convenient to have human readable files. And the performance hit is small in most cases. This last point is only true up to a point, of course.


      This returns us to the question: does software performance still matter? As both a (somewhat long in the tooth now) programmer and user, I believe it does and consider lengthy boot-up times one area where it does.

      It seems that 10 years ago, people were able to boot up Linux in 5 seconds… https://lwn.net/Articles/299483/ This old article is amusing…

      • It spends a full second starting the loopback device—checking to see if all the network interfaces on the system are loopback.
      • Then there’s two seconds to start “sendmail.” “Everybody pays because someone else wants to run a mail server,”
      • Another time-consuming process on Fedora was “setroubleshootd,” a useful tool for finding problems with Security Enhanced Linux (SELinux) configuration. It took five seconds.
      • The X Window System runs the C preprocessor and compiler on startup, in order to build its keyboard mappings.
      • It spends 12 seconds running modprobe running a shell running modprobe, which ends up loading a single module.
      • The tool for adding license-restricted drivers takes 2.5 seconds—on a system with no restricted drivers needed. “Everybody else pays for the binary driver,”
      • And Ubuntu’s GDM takes another 2.5 seconds of pure CPU time, to display the background image.

      My impression is that with the right configuration, in 2017, it is easy to boot Linux under 5 seconds.

  8. Performance matters any time there is a constraint; processor speed, power consumption, run time, response latency, etc.

    Performance matters any time your code runs simultaneously on multiple servers, as it becomes the difference between spending on 100 servers or cloud instances, or 1,000.

    Performance matters any time you are in a frame-rate or transaction processing rate shootout with a competitor.

    Performance (of the other programs) matters any time your code shares the system with other programs.

    It doesn’t matter how fast a processor is, if 30% of its instructions go to unnecessary code, a program could be made to run 30% faster. It doesn’t matter how high the aggregate performance of a multi-core processor is, if your code is single-threaded and has a performance issue.

    My book on performance https://www.amazon.com/dp/1491922060/Optimized-C++

  9. Henri de Feraudy says:

    In biometrics efficiency is extremely important.
    Suppose you have a fingerprint and you must search through hundreds of millions of recorded fingerprints to see which one matches the closest, then you really need good processing, perhaps clever indexing.
    In image processing it’s very important as well.

    1. Bob Kerns says:

      I will argue for your biometrics example, it is indexing, not coding efficiency that matters. Algorithms and data representation pay off far more than processing speed per se. Often by many orders of magnitude.

      By contrast, image processing generally only can be sped up by making the code run faster, parallelism, and making good use of any available GPU.

      But producing images from a model puts us largely back into algorithmic territory again at the first level, yet with a need to ship much of the work to the GPU.

      It is always about the bottlenecks.

    2. Bob Kerns says:

      The benefit of improving performance must always be weighed against the cost of achieving it and maintaining it.

      We have all seen bugs introduced in the quest for performance. Optimizations that over their life won’t pay back the CPUA spent compiling them. Optimizations that make the code hard to understand and deter future improvements.

      5he challenge is always finding the right <1% of code to optimize.

      Compilers and runtimes already do a great job of speeding up the small stuff- – the constant factor speedups. The payoff there is because the benefit is so ubiquitous. Libraries often provide a similar broad benefit, when key routines are optimized.

      Individual application optimizations have to pay off on their own merits, on their on impact on the bottlenecks faced by that application. That can be response time, or scaling, or number of servers to be run…

  10. TS says:

    There is an exception for the “>90% of [possible] optimization are irrelevant”: If you´re optimizing for code size it doesn´t matter where the optimization takes place – as long as it reduces the size of the containing code block it is as useful as any other optimizations. This is mainly interesting for small embedded systems, but might also be useful in situations where fetching code over and over has a relevant performance hit.

    Something to note is that fast code is way more important with faster machines: Back in the past no matter how well your software was optimized it would quickly reach the memory and computation limits of the hardware, thus only rather simple problems were practical. Today computation power is many magnitudes higher, thus the difference between well-performing and slow software makes more difference than ever before: Maybe not for a simple word processor, but definitely for stuff like video encoding, any kind of detailed simulation (no matter if it is research or gaming), image processing,…. Not to mention that todays multi-core hardware with parallel processing and several layers of memory as well as lots of specialized add-ons is more complex than older hardware where the hardware behaves more uniform.

    However, today the most relevant performance difference is caused by perceived difference: An application which always reacts in a fraction of a second will be perceived as being quicker than a benchmark-winning piece of software which takes an occasional multi-second break here and then.

  11. Peter Boothe says:

    The “datacenter tax” is pretty severe – http://www.eecs.harvard.edu/~skanev/papers/isca15wsc.pdf – and represents 30% of the load in most datacenters. Every percentage improvement can represent real money if there are enough datacenters involved.

  12. Steffen Guhlemann says:

    Some remarks:
    1. one has to distinguish between 3 impacts of how performant something is:
    a) immediate response on gui actions (not just in games): If i click a checkbox, i want to see immediately, what that means for my data.
    b) throughput: how much data can i process in a given timeframe (usually on a server, but we do stuff like this also on a client)
    c) general resource usage: if everything happens fast enough using all my cpu cores, i might be unable to browse the web in the mean time, because my whole computer is unresponsive

    2. Nobody mentioned the power of time complexity. I deal a lot with problems at work, that are more than linear in time. The chess problem is a good example of this. This is an exponential problem, which means, if my computer now is 1 million times faster than in the 80ies, i can now just look 4 halfsteps more ahead. This will probably not help me, beat a median level chess player, i need other techniques, not to bit-bash-optimizes the linear program flow, but to reduce the exponent in the exponential time complexity.
    For problems like this, pure speed means not so much. One really needs to think about efficient algorithms to beat a human master chess player.
    => That means, efficient algorithms not only reduce user waiting time a bit, they really allow us to process problems, we could not solve otherwise (and still could not solve, if we only relied on computers getting faster)

    1. efficient algorithms not only reduce user waiting time a bit, they really allow us to process problems, we could not solve otherwise

      I would argue that this is true of software performance in general, not just of algorithms. It is common to take one fixed algorithm and to improve its implementation to get a 10x gain. Well, a 10x gain can make something that is not financially viable, and make it practical.

    2. TS says:

      Time (and also space) complexity actually is what I was referring to with ” fast code is more important with faster machines”:

      In general optimized algorithms have a rather high per-step or setup cost, non-linear access patterns, reduced flexibility, complicated write/parallel access, etc. compared to simple flat array or list structures. Thus the benefit isn´t visible until a certain data size is reached – which is more likely with increasing processing power as it usually comes with increased storage capabilities.

      Another important factor is hardware latency: A O(n) algorithm running in main memory can crunch through gigabytes whenever a O(log n) algorithm working on disk storage is waiting for each of its few bytes accessed. Even worse with parallel algorithms which are often quickly limited by contention issues.
      Thus, proper data layout and partitioning is an important part in getting full benefit of a certain algorithm as well.

      In reality the data and requirements often changes over time, thus an implementation which serves the original demand perfectly might perform rather poor after a while, thus regularly adapting and rewriting parts is an important part for software performance as well – most likely the hardest one since business is often unwilling to spend effort on something which still works somewhat well and the degradation usually happens slowly and is masked by technology advancements to a certain degree.

  13. Joe Duarte says:

    Daniel, this reminds me that your integer libraries are outstanding.

    Separately, I think performance could be made to matter much more to users if an enterprising company offered a clean-sheet, profoundly faster OS. Legacy OSes – Windows 10, macOS, Linux – are all unreasonably slow (though as you said Windows is slowest, probably because Macs are SSD-only these days, at least the laptops).

    None of theses OSes are architected for instant response to trivial user actions, and at this point I think it’s reasonable to expect computers to complete trivial tasks instantly, where “instant” is a half-second or less. An InstantOS would fully open any and all applications instantly, and the apps would be completely ready for whatever subsequent actions a user could take at that stage. It would also close apps instantly, both visibly and with respect to background tasks and memory usage. We’re still waiting far too long for apps to open, and for them to do things, and a consistently half-second or faster response would change the experience of using a computer.

    Even Macs are much slower than this most of the time. I don’t think users will understand the advantage of an InstantOS until they actually use it and are made to comprehend how long they’ve been waiting on computers to do stuff, and how it feels to use an InstantOS. Until then, lots of people will mouth nonsense about performance not mattering, because they don’t know what a performant computer feels like.

    So I think it’s a great time for a team to build a new OS. Everyone’s asleep and thinks that Windows and Mac are good enough. They’ll keep thinking that until someone shows them what’s possible (and a new OS could take the opportunity to have fundamentally different security properties, more secure by design – e.g. it’s completely absurd that data can leave our computers without our awareness or any easy user insight or control into the process. That’s a core dependency of many, many hacks, that our computers allow massive outbound traffic with little control or transparency.)

    A reasonable OS should also boot instantly (you referred to wake-up, but booting still matters to some people).

    1. though as you said Windows is slowest

      To clarify: I wrote that macOS had “instant on” recovery from sleep mode whereas Windows did not.

  14. Software performance only matters a minute fraction. Genuine software innovation center companies understand that the value of actual code behind that software is what matters the most not its size. Quite a Blog. Keep Writing.