Daniel Lemire's blog

, 16 min read

Evil abbreviations in programming languages

20 thoughts on “Evil abbreviations in programming languages”

  1. Charles Wells says:

    @JohnCook Mathematica has Arrowheads and TableHeading. But yes, its naming is usually pretty good.

  2. John Cook says:

    One of the nice things about Mathematica is that naming is very consistent and seldom uses abbreviations. I’ve been able to come back to Mathematica after not using it for years and quickly remember or guess what things are named.

  3. Derek R says:

    The “memcpy” style naming has a reason. In the old days of C (1989 or thereabouts), the linkers were only guaranteed to recognize the first six characters of a variable or function name.

  4. This is what turns me off about the C programming language. Reading well structured programs on github feels like reading javascript minified code.

  5. Marco See says:

    Without providing any benefit to anyone? Perhaps you mean without any benefit to you. In many cases they provide benefit to me as they result in cleaner easier to read code.

    It’s a bit like the difference between an introductory/tutorial guide and a reference manual. When starting out everything needs to be spelt out slowly and longhand but once you are familiar you start to appreciate conciseness and brevity.

    Perhaps it’s similar to using letters in algebra.

    If we don’t see the benefit of the short forms, perhaps we should be spelling out . as “fullstop”.

  6. Chris Nahr says:

    Regarding C memcpy, that’s likely a result of DEC PDP-11 file name limitations. You could only use 6 characters for the main part of the name, plus 3 characters for the (one, single) extension. So the file implementing memcpy could be saved as the eponymous memcpy.c or memcpy.asm. No space for an extra “o”!

    (Not sure if there was ever a good implementation reason for using exactly the same name for function & file name, or if it was just more pleasing to the designers…)


  7. I have taught coding to non-native English speakers, and I can testify that many people have trouble remembering if it’s `length` or `lenght` (and similarly for `height`). Abbreviating it to `len` fixes this problem.

    (Of course, I realize that the ideal situation is an editor where you can write leng and it gets autocompleted. In this way one does not care anymore if a function is called `len` or `lengthInCharactersOfThisMostPreciousStringComposedOfUtf8Codepoints`.)

  8. Denommus says:

    Ruby and Python use elsif and elif not to prevent you from typing else if, but to prevent too many indentation levels.

  9. When memcpy() was coined, linkers on some systems limited the length of external symbols; eight and six characters were common limits. A quick look at 7th Ed. Unix suggests external symbols longer than six characters came along with stdio.h and ctype.h, later additions to Unix, perhaps when the earlier six-character (DEC) limit was removed.

    I disagree with your other points. len() is so common that the shorter, less noisy, quicker to pronounce len is to be preferred over length. Ditto str and bool. Why do you not complain of int, should it not be integer? s/float/floatingpoint/ s/func/function/ s/def/define/ s/var/variable/

    Ruby takes its elsif from Perl, Python its elif from Bourne shell. These have special else-if keywords to make if-else chains be at the same parse level. Unlike C, what follows else must be a block and not a statement, e.g. another if. Without elsif it would be if () {} else { if () {} else { if () {} } } with lots of closing braces at the end, a la Lisp. A keyword elseif looks harder to pronounce and probably has novices to the language wonder why it exists and why can’t they add a space.

    You say source code is the ultimate documentation of ideas. But notation is needed to express those ideas succinctly. Even if not programming, one would find a limited vocabulary useful compared to free-form English. Isn’t that why mathematics has much notation, to succinctly represent understood concepts from a limited set?

    Using full English for programming leads to very wordy code that takes time for a human to parse, has little content within a given space, e.g. screen, and looks like COBOL. http://www.csis.ul.ie/cobol/examples/SeqIns/SEQINSERT.htm

  10. Florian Wilhelm says:

    Oh, I agree so much. This also bugs me with golang and rust. Why make the same mistakes over and over again?

  11. Mark Dominus says:

    memcpy is six letters long because at the time it was invented, it was still common for linkers to truncate shared identifiers to six characters. Support for these old linkers was still in the C standard until at least 1999.

    Also, saving keystrokes is not as ridiculous as you seem to think. Early terminal keyboards were very difficult to type on, and data transmission was limited to ten *characters* per second.

    The elif / elsif abbreviation eliminates the bug, very common in C programs, where an “else” clause is silently associated with the wrong condition. The designers of the shell, perl, python, ruby, etc., were well aware of this.

    I suggest that when you see decisions in the past that you don’t understand, it would be more productive to try to understand them _before_ you label them “evil”.

  12. Mark Dominus says:

    Also, your implication that “Boolean” is somehow easier to understand and less jargony than “bool” seems patently absurd.

  13. KWillets says:

    I like mangled names because they’re specific. There’s only one memcpy, but there might be dozens of MemoryCopy’s.

    Likewise a single strange algorithm may not be better described by a long name. Is “QuickSort” better than “qsort”? Neither means anything by itself.

  14. lasizoillo says:

    Python elif its needed to avoid anidation, so avoid involve nesting tabs. But i can’t say anything about def instead definition.

  15. Haha, I don’t mind short names. What is clearly irritating is that different languages have different conventions. Boolean in Java vs bool in C++ may be irritating, but the really infuriating thing is the lack of indexing conventions. Sometimes indexes start from zero, sometimes they start from one. It is even worse with ending indexes, because they often point to the element after the last one. This is apparently a convention for most Java libraries and standard functions. However, it is very poorly described. I would say these things are barely mentioned in the docs.

  16. Max Lybbert says:

    I’ve loosened up quite a bit on names recently. I’m no longer convinced that longer names provide much benefit. And I have a hard time calling any of these abbreviations evil.

    But I do have to add the story that “Ken Thompson was once asked what he would do differently if he were redesigning the UNIX system. His reply: ‘I’d spell creat with an e.'” ( http://en.wikiquote.org/wiki/Ken_Thompson#Quotes )

  17. Darek says:

    I think that kind of names are “ok”. We just have to memorize them. A worse names are names that are similar but they do completely different things, for example in a Picat: “chr” and “char”. “char” check if variable is a character. “chr” change a number (e.g. 97) to it’s UTF8 representation (“a” in this case).
    On the other hand, names that are different but do similar thing. For example “/” that divides 2 numbers and returns integer, “div” that divides 2 numbers and return float. I like how Factor deals with this – “/” returns integer, “/f” returns float.

  18. Aaron Meurer says:

    I think you picked the tamest example for C, which is the worst at this, especially if you get to some of the POSIX stuff like unistd.h. Python does abbreviate some stuff, but at least it has a culture of spelling things out with long variable names rather than using short, confusing names.

  19. Andy Edwards says:

    In other words, old languages like C sort of have an excuse, but more recent languages like Ruby and Python do not 🙂

  20. Todd Lehman says:

    Hasn’t mathematics been doing this (abbreviating keywords) for centuries?

    limit => lim
    sine => sin
    cosine => cos
    tangent => tan

    Maybe the jargon is shorter because it becomes easier to read with practice?