, 4 min read
What the heck is the value of “-n % n” in programming languages?
When coding efficient algorithms having to do with hashing, random number generations or even cryptography, a common construction is the expression “-n%n“. My experience has been that it confuses many programmers, so let us examine it further.
To illustrate, let us look at the implementation of std::uniform_int_distribution found in the GNU C++ library (Linux) and clean up the line in question:
threshold = -range % range;
The percent sign (%) in this expression refers to the modulo operation. It returns the remainder of the integer division. To simplify the discussion, let us assume that range is strictly positive since dividing by zero causes problems.
We should pay attention to the leading minus sign (–). It is the unary operator that negates a value, and not the subtraction sign. There is a difference between “-range % range" and “0-range % range". They are not at all equivalent. They will actually give you different values; the latter expression is always zero. And that is because of the priority of operation. The negation operation has precedence on the modulo operation which has precedence on the subtraction operation. Thus you can rewrite “-range % range" as “(-range) % range". And you can write “0-range % range" as “0- (range % range)“.
When the variable range
is a signed integer, then the expression -range % range is zero. In a programming language with only signed integers, like Java, this expression is always zero.
So let us assume that the variable range
is an unsigned type, as it is meant to be. In such cases, the expression is generally non-zero.
We need to compute -range. What does it mean to negate an unsigned value?
When the variable range
is an unsigned type, Visual Studio is likely to be unhappy at the expression -range. A recent Visual Studio returns the following warning:
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Nevertheless, I believe that it is a well defined operation in C++, Go and many other programming languages. Jonathan Adamczewski has a whole blog post on the topic which suggests that the Visual Studio warning is best explained by a historical deviations from the C++ standard from the Microsoft Visual Studio team. (Note that the current Visual Studio team seems committed to the standards going forward.)
My favorite definition is that –range is defined by range
+ (-range) = 0. That is, it is the value such that when you add it to range, you get zero. Mathematicians would say that it is the “additive inverse”. In programming languages (like Go and C++) where unsigned integers wrap around, then there is always one, and only one, additive inverse to every integer value.
You can define this additive inverse without the unary negation: if max
is the maximum value that you can represent, then you can replace –range by maximum
– range + 1. Or, maybe more simply, as (0-range). And indeed, in the Swift programming language, this particular line was represented as follow:
let threshold = (0 &- range) % range
The Swift language has two subtraction operations, one that is not allowed to overflow (the usual ‘-‘), and one that is allowed to overflow (‘&-‘). It is somewhat inconvenient that Swift forces us to write so much code, but we must admit that the result is probably less likely to confuse a good programmer.
In C#, the system will not let you negate an unsigned integer and will instead cast it as a signed integer, so you have to go the long way around if you want to remain in unsigned mode, like so…
threshold = (uint.MaxValue - scale + 1) % scale
This expression is unfortunately type specific (here uint).
To conclude: you can learn a lot just by examining one line of code. To put it another way, programming is a much deeper and complex practice than it seems at first. As I was telling a student of mine yesterday: you are not supposed to read new code and understand it right away all of the time.