Daniel Lemire's blog

Science and Technology links (August 7 2022)

, 3 min read

Increase in computing performance explain up to 94% of the performance improvements in field such as weather prediction, protein folding, and oil exploration: information technology is a a driver of long-term performance improvement across society. If we stop improving our computing, the…

Comparing strtod with from_chars (GCC 12)

, 1 min read

A reader (Richard Ebeling) invited me to revisit an older blog post: Parsing floats in C++: benchmarking strtod vs. from_chars. Back then I reported that switching from strtod to from_chars in C++ to parse numbers could lead to a speed increase (by 20%). The code is much the same, we go…

Round a direction vector to an 8-way compass

, 2 min read

Modern game controllers can point in a wide range of directions. Game designers sometimes want to convert the joystick direction to get 8-directional movement. A typical solution offered is to compute the angle, round it up and then compute back the direction vector. double angle = atan2(y, x); …

Negative incentives in academic research

, 3 min read

In the first half of the XXth century, there were relatively few scientists, and these scientists were generally not lavishly funded. Yet it has been convincingly argued that these scientists were massively more productive. We face a major replication crisis where important results in fields such…

Science and Technology links (July 23 2022)

, 6 min read

Compared to 1800, we eat less saturated fat and much more processed food and vegetable oils and it does not seem to be good for us: Saturated fats from animal sources declined while polyunsaturated fats from vegetable oils rose. Non-communicable diseases (NCDs) rose over the twentieth century…

How quickly can you convert floats to doubles (and back)?

, 1 min read

Many programming languages have two binary floating-point types: float (32-bit) and double (64-bit). It reflects the fact that most general-purpose processors supports both data types natively. Often we need to convert between the two types. Both ARM and x64 processors can do in one inexpensive…

Filtering numbers faster with SVE on Graviton 3 processors

, 4 min read

Processors come, roughly, in two large families x64 processors from Intel and AMD, and ARM processors from Apple, Samsung, and many other vendors. For a long time, ARM processors occupied mostly the market of embedded processors (the computer running your fridge at home) with the ‘big…

Go generics are not bad

, 1 min read

When programming, we often need to write ‘generic’ functions where the exact data type is not important. For example, you might want to write a simple function that sums up numbers. Go lacked this notion until recently, but it was recently added (as of version 1.18). So I took it out for a…

Looking at assembly code with gdb

, 8 min read

Most of us write code using higher level languages (Go, C++), but if you want to understand the code that matters to your processor, you need to look at the ‘assembly’ version of your code. Assembly is a just a series of instructions. At first, assembly code looks daunting, and I discourage you…

Filtering numbers quickly with SVE on Amazon Graviton 3 processors

, 3 min read

I have had access to Amazon’s latest ARM processors (graviton 3) for a few weeks. To my knowledge, these are the first widely available processors supporting Scalable Vector Extension (SVE). SVE is part of the Single Instruction/Multiple Data paradigm: a single instruction can operate on many…

Memory-level parallelism : Intel Ice Lake versus Amazon Graviton 3

, 3 min read

One of the most expensive operation in a processor and memory system is a random memory access. If you try to read a value in memory, it can take tens of nanosecond on average or more. If you are waiting on the memory content for further action, your processor is effectively stalled. While our…

Data structure size and cache-line accesses

, 2 min read

On many systems, memory is accessed in fixed blocks called “cache lines”. On Intel systems, the cache line spans 64 bytes. That is, if you access memory at byte address 64, 65… up to 127… it is all on the same cache line. The next cache line starts at address 128, and so forth. In turn,…

Parsing JSON faster with Intel AVX-512

, 4 min read

Many recent Intel processors benefit from a new family of instructions called AVX-512. These instructions operate over wide registers (up to 512 bits) and follow the Single instruction, multiple data (SIMD) paradigm. These new AVX-512 instructions allow you to break some speed records, such as…

Avoid exception throwing in performance-sensitive code

, 1 min read

There are various ways in software to handle error conditions. In C or Go, one returns error code. Other programming languages like C++ or Java prefer to throw exceptions. One benefit of using exceptions is that it keeps your code mostly clean since the error-handling code is often separate. It is…

Faster bitset decoding using Intel AVX-512

, 4 min read

I refer to “bitset decoding” as the action of finding the positions of the 1s in a stream of bits. For example, given the integer value 0b11011 (or 27 in decimal), I want to find 0,1,3,4. In my previous post, Fast bitset decoding using Intel AVX-512, I explained how you can use Intel’s new…

Fast bitset decoding using Intel AVX-512

, 3 min read

In software, we often use ‘bitsets’: you work with arrays of bits to represent sets of small integers. It is a concise and fast data structure. Sometimes you want to go from the bitset (e.g., 0b110011) to the integers (e.g., 0, 1, 5, 6 in this instance). We consider with ‘average’ density…

Removing characters from strings faster with AVX-512

, 3 min read

In software, it is a common problem to want to remove specific characters from a string. To make the problem precise, let us consider the removal of all ASCII control characters and spaces. In practice, it means the removal of all byte values smaller or equal than 32. I covered a related problem…

An overview of version control in programming

, 16 min read

In practice, computer code is constantly being transformed. At the beginning of a project, the computer code often takes the form of sketches that are gradually refined. Later, the code can be optimized or corrected, sometimes for many years. Soon enough, programmers realized that they needed to…

Floats have 15-digit accuracy in their normal range

, 1 min read

In programming languages like JavaScript or Python, numbers are typically represented using 64-bit IEEE number types (binary64). For these numbers, we have 15 digits of accuracy. It means that you can pick a 15-digit number, such as 1.23456789012345e100 and it can be represented exactly: there…