Increase in computing performance explain up to 94% of the performance improvements in field such as weather prediction, protein folding, and oil exploration: information technology is a a driver of long-term performance improvement across society. If we stop improving our computing, the…
A reader (Richard Ebeling) invited me to revisit an older blog post: Parsing floats in C++: benchmarking strtod vs. from_chars. Back then I reported that switching from strtod to from_chars in C++ to parse numbers could lead to a speed increase (by 20%). The code is much the same, we go…
Modern game controllers can point in a wide range of directions. Game designers sometimes want to convert the joystick direction to get 8-directional movement. A typical solution offered is to compute the angle, round it up and then compute back the direction vector.
double angle = atan2(y, x);
…
In the first half of the XXth century, there were relatively few scientists, and these scientists were generally not lavishly funded. Yet it has been convincingly argued that these scientists were massively more productive. We face a major replication crisis where important results in fields such…
Compared to 1800, we eat less saturated fat and much more processed food and vegetable oils and it does not seem to be good for us:
Saturated fats from animal sources declined while polyunsaturated fats from vegetable oils rose. Non-communicable diseases (NCDs) rose over the twentieth century…
Many programming languages have two binary floating-point types: float (32-bit) and double (64-bit). It reflects the fact that most general-purpose processors supports both data types natively.
Often we need to convert between the two types. Both ARM and x64 processors can do in one inexpensive…
Processors come, roughly, in two large families x64 processors from Intel and AMD, and ARM processors from Apple, Samsung, and many other vendors. For a long time, ARM processors occupied mostly the market of embedded processors (the computer running your fridge at home) with the ‘big…
When programming, we often need to write ‘generic’ functions where the exact data type is not important. For example, you might want to write a simple function that sums up numbers.
Go lacked this notion until recently, but it was recently added (as of version 1.18). So I took it out for a…
Most of us write code using higher level languages (Go, C++), but if you want to understand the code that matters to your processor, you need to look at the ‘assembly’ version of your code. Assembly is a just a series of instructions.
At first, assembly code looks daunting, and I discourage you…
I have had access to Amazon’s latest ARM processors (graviton 3) for a few weeks. To my knowledge, these are the first widely available processors supporting Scalable Vector Extension (SVE).
SVE is part of the Single Instruction/Multiple Data paradigm: a single instruction can operate on many…
One of the most expensive operation in a processor and memory system is a random memory access. If you try to read a value in memory, it can take tens of nanosecond on average or more. If you are waiting on the memory content for further action, your processor is effectively stalled. While our…
On many systems, memory is accessed in fixed blocks called “cache lines”. On Intel systems, the cache line spans 64 bytes. That is, if you access memory at byte address 64, 65… up to 127… it is all on the same cache line. The next cache line starts at address 128, and so forth.
In turn,…
Many recent Intel processors benefit from a new family of instructions called AVX-512. These instructions operate over wide registers (up to 512 bits) and follow the Single instruction, multiple data (SIMD) paradigm. These new AVX-512 instructions allow you to break some speed records, such as…
There are various ways in software to handle error conditions. In C or Go, one returns error code. Other programming languages like C++ or Java prefer to throw exceptions. One benefit of using exceptions is that it keeps your code mostly clean since the error-handling code is often separate.
It is…
I refer to “bitset decoding” as the action of finding the positions of the 1s in a stream of bits. For example, given the integer value 0b11011 (or 27 in decimal), I want to find 0,1,3,4.
In my previous post, Fast bitset decoding using Intel AVX-512, I explained how you can use Intel’s new…
In software, we often use ‘bitsets’: you work with arrays of bits to represent sets of small integers. It is a concise and fast data structure. Sometimes you want to go from the bitset (e.g., 0b110011) to the integers (e.g., 0, 1, 5, 6 in this instance). We consider with ‘average’ density…
In software, it is a common problem to want to remove specific characters from a string. To make the problem precise, let us consider the removal of all ASCII control characters and spaces. In practice, it means the removal of all byte values smaller or equal than 32.
I covered a related problem…
In practice, computer code is constantly being transformed. At the beginning of a project, the computer code often takes the form of sketches that are gradually refined. Later, the code can be optimized or corrected, sometimes for many years.
Soon enough, programmers realized that they needed to…
In programming languages like JavaScript or Python, numbers are typically represented using 64-bit IEEE number types (binary64). For these numbers, we have 15 digits of accuracy. It means that you can pick a 15-digit number, such as 1.23456789012345e100 and it can be represented exactly: there…