Daniel Lemire's blog

, 21 min read

The dual-shotgun theorem of software engineering

18 thoughts on “The dual-shotgun theorem of software engineering”

  1. jmora says:

    Have you ever tried to translate a pseudocode algorithm to actual code? I bet you tried and succeeded. I failed at first:
    https://stackoverflow.com/questions/6575058/tarjans-strongly-connected-components-algorithm-in-python-not-working

    Many algorithms in papers (formally proven to be correct), usually related with computationally complex problems [>O(n log n)] use clever tricks to be fast. As a result, of both circumstances, they are usually very hard to modify.

    My guess is that partitioning a problem in smaller problems as independent (decoupled) as possible is the best approach for software that is easy to modify. For example, ideally, in the functional programming paradigm, we would have a number of pure functions that are composed to create the program. It should be straightforward to modify one of these functions or the function composition (dropping, adding, permuting, or replacing functions).

    Finally, for evaluation, we could consider programming katas, where a problem needs to be solved and then the problem is modified. It may be untenable for teams of programmers, but it may be very instructive for groups of students. As students need to learn different paradigms (object oriented programming, functional programming, and what not) the testing and comparing of the paradigms and approaches should be feasible and could be interesting.

    1. Many algorithms in papers (formally proven to be correct), usually related with computationally complex problems [>O(n log n)] use clever tricks to be fast. As a result, of both circumstances, they are usually very hard to modify.

      I should make it precise that I meant “fast software”, not “fast algorithm on paper”.

      For example, ideally, in the functional programming paradigm, we would have a number of pure functions that are composed to create the program. It should be straightforward to modify one of these functions or the function composition (dropping, adding, permuting, or replacing functions).

      Our entire civilization runs on code written in C using no fancy paradigm… no object-orientation, no function programming… The Linux kernel is beautiful and powerful code that lots of people, me included, have modified for fun or profit.

      You can write slow and unparseable buggy code in Haskell, scala, and so forth. It is great that people enjoy functional programming with pure functions and no side-effects and all that… and sure, it can work, but there are many good ways to produce high-quality code. Fortran, Go, C++, Lisp, JavaScript… all of those can serve to write good code.

      Finally, for evaluation, we could consider programming katas, where a problem needs to be solved and then the problem is modified. It may be untenable for teams of programmers, but it may be very instructive for groups of students. As students need to learn different paradigms (object oriented programming, functional programming, and what not) the testing and comparing of the paradigms and approaches should be feasible and could be interesting.

      For training, that could be great.

      1. jmora says:

        Crappy code (and analogously good code) can be written on every language and every paradigm, but in some it is easier than in others.

        “Computer languages differ not so much in what they make possible, but in what they make easy.” – Larry Wall

        Most paradigms (if not all) have been created to make better software more easily, taking different approaches. I am not saying that functional programming is “the right paradigm” (disclaimer: I certainly like it), but I think it is an easy context to explain the point of simple, composable, and modifiable parts of code to preserve ease of modification in large codebases.

        For example, we can consider microservices and software modules (e.g. Java packages). Both allow for the decoupling of the software, but microservices make harder to violate it. (There are many more and greater pros and cons to microservices). With pressing deadlines, and Alice on holidays, Bob may be tempted to create a subclass of a class from Alice in his package, “for a quick fix to be patched later” (which may be never).

        Wrt OOP, abstraction and encapsulation in classes and objects was a nice try, it has been a successful one, but it is troublesome as it hides state, and this is a problem for parallel and concurrent code execution. Changing a program to make it parallel is normally differently difficult in OOP and FP.

        Wrt the Linux kernel, after so many years either it is wonderful or Hell on Earth. To make it wonderful it takes expertise and discipline (probably in different amounts depending on languages and paradigms, if there was a choice). A different approach was Hurd, supposedly more modular, it has not been so successful (possibly for non-technical reasons). Would it be more easy to change? We may never know.

        Wrt fast software, it can be faster by removing nonsense or by using clever tricks. I have seen lots of both. While removing nonsense makes code cleaner, clever tricks usually make it more fragile, as they are often dependent on particular context conditions that may not be generalizable or not hold in the future. Therefore, I see the correlation between fast and maintainable as weak.

        1. It is hard to build really fast systems by adding up clever tricks on top of a spaghetti. First step in making code fast is to have a great design.

          Most paradigms (if not all) have been created to make better software more easily, taking different approaches.

          In my view, programming is now a lot better than it was 20 years ago, but very little has to do with programming paradigms. For every scala programmer who swears by functional programming, you have a Go programmer who swears by the minimalistic nature of the language… and then you have tons of C++ programmers who swear by its amazing powers… and then you have the Python folks and so forth.

          They are all correct.

          1. jmora says:

            Nobody said it would be easy. Sometimes the spaghetti is created by adding the clever tricks, until the software is unmaintainable and needs to be re-started from scratch.

            Good software is robust. Bad software is fragile. Really bad software is unmaintainable. Good software processes result in anti-fragile software: modifying the code means refactorising it, producing better results. Software that is fast because it has been optimized for years may fall in the anti-fragile group, but what we would have in such case is a survivor bias. Software that has been optimized in ways that make it more fragile is simply not around for very long.

            In short, I agree when you say: “First step in making code fast is to have a great design”, but I would reword it as: “First step in making code fast _should be_ to have a great design”. Then, making it fast could serve as a validation method of the quality of the design, but focusing on making the code fast is not going to be sufficient (or necessary) to make a good design, or to make a design good.

            In contrast, I like this quote (found in the Falcon web page, a Python web framework):

            “Perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.” – Antoine de Saint-Exupéry

            The classic KISS, YAGNI, etc. is likely to produce code that is easy for humans to understand and maintain, and for computers to execute, to some extent, the problem being that computers and humans do not “think” in the same ways.

            Now, different languages and paradigms forget about KISS in different ways. There are talks, books, collections of examples (e.g. the DailyWTF), and jokes[1] on that. But in a “too late” attempt at KISS, I stop digressing about that now, because there is just too much to discuss about it.

            Finally, wrt everybody being correct, probably they are more than I will ever be, this is not my field of expertise (if I have any). Nevertheless, I think that using katas to evaluate paradigms and languages could be an interesting and acceptably scientific approach to some aspects of language design and software engineering that, still to date, are more akin to a craft or an art than to engineering as usually understood. Cognitive ergonomics is hard, IMHO.

            [1] http://i.imgur.com/mh3P7CD.png

  2. llogiq says:

    Correct and fast may be a good enough approximation for many cases. Alas, current systems require bespoke micro-optimizations to eke out the last 10s of percents of performance, as you are surely keenly aware.

    Those optimizations make the code more specialized, less readable and thus less malleable. This directly contradicts your theory.

    All is not lost however. Current plain Rust code will often perform within an order of magnitude of hand-optimized C code. The strong type system and borrow checker benefit writing correct code. Together this will likely lead to code that is easy to read and change, while being fast enough for many applications.

    We don’t have enough experience yet to tell whether this is also *good* code, but current results are encouraging.

  3. Neither correct nor fast code imply good code. Why the combination of both should imply good quality?

    Also you imply that optimizations can only be added to well designed code. That is false. Optimizations can be added easily to good code and can be added painfuly to bad code. The result is fast code, irrespective of quality. Similar statement can be said of correctness.

    Accorsing to theory of computation the same software (fast and correct) can be equivalently expressed in several languages and very different ways, that accounts for the existence of bad code and good code solving the same problem in equivalent forms.

    1. I think that there is some confusion between “applying optimizations” and having “fast code”.

      In my experience, it is very hard (in general) to make non-trivial software run faster. There is a reason why we have substantial performance differences between browsers, for example. It is not that the other guys, the team that is leading the slower browser, can just apply “optimizations” to make their browser faster. Optimizations don’t come in the form of a magical powder you can spray on existing code.

      It is very hard work to make things faster, just as it is very hard work to make it correct. Both are very demanding.

  4. > “The best software is the software that is easy to change.”
    Depends on the metric of the quality (whatever marketing guys tell us, not all the software needs to be changed) – but as a default, it will do.

    > Code that is both correct and fast can be modified easily.

    It is soooo wrong that I don’t even see where to start. Being “fast” has absolutely nothing to do with being maintainable. Moreover, the-absolutely-fastest code tends to be rather unmaintainable. As for the examples – they’re abundant; take, for example, OpenSSL – the fastest versions are still asm-based, and believe me, you really REALLY don’t want to read this kind of asm, leave alone modify it… The same thing follows from Knuth’s “premature optimization is a root of all evil”, and so on and so forth.

    1. I would probably rank OpenSSL as one of the greatest code bases of all times.

  5. Tksfz says:

    This is demonstrably false. Correct code can be arbitrarily complicated while still being correct and this is true in a non-trivial sense. As for fast code being more maintainable, on the contrary, everyone knows to avoid premature optimization precisely because it’s understood that optimization of code typically leads to more complicated, harder-to-maintain code.

    For your argument that there’s some kind of pressure that causes streamlined code to become more maintainable: on the contrary, it’s fixing the corner cases and bugs that tends to cause large code bases to become even more complicated.

    As for counterexamples: take two implementations of the same functionality, but one is implemented in C (or assembly if you want an extreme example) and the other is implemented in, say Ruby or Python or Java. The C program will undoubtedly be faster. But it will also undoubtedly be many times longer in length, and harder to maintain. I think you can easily imagine this.

    As for the notion you assume that there is a trade-off between abstraction and performance, there’s a widely pursued long-held goal with both C++ and Rust to enable abstraction with zero (runtime) cost. As for the notion that abstraction complicates things, I’d agree – abstraction needs to be coupled with actual reuse to pay for itself. But I don’t think abstraction implies a performance cost.

    Am I misunderstanding your argument?

  6. Programming is about people, not code.

    The ease with which something can be modified depends upon the person doing the modification and the knowledge that they have: Knowledge of the system, it’s architecture, the requirements that it is designed to fulfil, the processes and tools used to develop; build; deploy and maintain it; how knowledge of different components is distributed around the organisation, and the social context within which it is developed and used. Even the notions of correctness and speed are a function of the expectations that have been set; and how those expectations have been communicated.

    Having said that, the OP’s instincts are right. Every time we add a new abstraction, we add new vocabulary, new concepts that need to be communicated and understood. Highly abstract code is only easier to modify if you already understand what it does.

    This is why we as a community are so keen on idioms, standards, and the principle of least surprise.

    As a final point, I would like to emphasise the fact that in order to perform well at our chosen discipline, we need to capture, organise and communicate a huge amount of information; only some of which is code. Architecture; vocabulary; the mental model which arises from our analysis of the problem, and the abstractions and concepts which arise in the design of it’s solution – All of these need to be captured and communicated.

    The teaching of other developers is the central problem that the programming discipline aims to solve. Recognise the truth in this, and we may finally make some progress.

  7. Mikhail Glushenkov says:

    > What is nice about my conjecture is that it is falsifiable. That is, if you can come up with code that is fast and correct, but hard to modify, then you have falsified my conjecture.

    Judy arrays come to mind.

    1. qznc says:

      I would say TeX is a counter example.

      It is correct because Don Knuth says so. Proof by authority. Also, in use by thousands of people all over the world.

      It is fast because it was usable on computer 30 years ago.

      It is not easy to change. Just look at XeTeX and LuaTeX. Adding a feature like Unicode support is not easy.

      1. Interesting point.

  8. Jake says:

    For any compiled language, in order to even correctly measure its speed and correctness, you must first compile it into a less readable, less changeable instruction set. By definition, the compiled instruction set is equally fast and equally correct (as it’s the artifact actually being measured.) But it is unquestionably harder to modify

  9. Code that is both correct and _small_ can be modified easily.
    . smallness justifies merciless refactoring;
    . smallness justifies removal of extraneous information.
    . correctness, of course, justifies test-first development.

  10. Andreas Abel says:

    > Code that is both correct and fast can be modified easily.
    > What is nice about my conjecture is that it is falsifiable. … So it is scientific!

    I’d be a bit careful here. A precondition is that there is a “conjecture”. That presupposes well-defined terms.

    “Code” : not well-defined, but ok, we can think of a well-defined instance, like Haskell or Java code.
    “correct” : you cannot be correct unless you have specification.
    “fast” : not well-defined.
    “modified easily” : not well-defined, little chance to come up with a definition.

    First, before speaking about verification or falsification of a hypotheses, you need to have one. Meaning a statement whose meaning cannot be debated.

    No science here (yet), I am afraid.