Daniel Lemire's blog

, 6 min read

Some useful regular expressions for programmers

9 thoughts on “Some useful regular expressions for programmers”

  1. dsernst says:

    You might like prettier: https://prettier.io.

    It handles all this and more automatically for you, for almost every language. It’s like magic.

    1. Jennifer says:

      As I understand the idea of the article is to helps and teach useful regular expressions and uses code formatting as an example.

  2. Kai says:

    Assuming that most of your research is done in C or C++, I wonder why you’re not considering using clang-format for these tasks as regular expressions will only get you so far?

    1. I program using a wide range of programming languages, including C and C++. I do use clang-format and other code reformatters.

  3. Shiv says:

    Nice tips. You can use \S instead of [^\s] to shorten some of these.

    To delete the extra space, I can select it with look-ahead and look-behind expressions such as <(?<=^(\s\s)*)\s(?=[^\s]).

    I think you’ve got an extra < in there, unless that’s some sort of new metacharacter.

    I do not want a space after the opening parenthesis nor before the closing parenthesis. I can check for such a case with (\(\s|\s\)). If I want to remove the spaces, I can detect them with a look-behind expression such as (?<=\()\s.

    This is probably the desired behaviour, but just to note, \s will also match newlines. If you wanted to preserve those, you could use [ \t] instead.

    Your use of lookbehind & lookahead is interesting. When I’m doing search-and-replace on code, I always include the prefix and suffix in capturing groups and account for those in the replacement.

    …on further consideration, that’s probably because I’m doing it in Emacs, whose native regex engine is rather primitive.

    1. This is probably the desired behaviour, but just to note, \s will also match newlines.

      It depends on whether the regular expression is applied to the whole documents or to lines. Many editors match regular expressions on a line-by-line basis by default.

      1. Shiv says:

        Oh, my mistake! I don’t think I’ve ever used one like that. Thanks, learnt something new. 😊

    2. Greg says:

      Your habit to use capture groups instead of lookarounds is good. A lot of regex engines don’t support variable-length lookarounds.

  4. These are some really useful regular expressions. I use some all the time, but I used your blog as an opportunity to delete all the annoying trailing spaces in our code base (like really, who does that?).