Daniel Lemire's blog

Science and Technology (October 31st 2021)

, 1 min read

Though exoskeletons are exciting and they allow some of us to carry one with physical activities despite handicaps, they appear to require quite a bit of brain power. In effect, though they may help you move, they require a lot of mental effort which can be distracting.1. It is difficult to make…

In C, how do you know if the dynamic allocation succeeded?

, 2 min read

In the C programming language, we allocate memory dynamically (on the heap) using the malloc function. You pass malloc a size parameter corresponding to the number of bytes you need. The function returns either a pointer to the allocated memory or the NULL pointer if the memory could not be…

In C++, is empty() faster than comparing the size with zero?

, 3 min read

Most C++ programmers rely on “STL” for their data structures. The most popular data structure is probably vector, which is just a dynamic array. The set and the map are other useful ones. The STL data structures are a minimalist design. You have relatively few methods. All of them allow you to…

Science and Technology links (October 23rd 2021)

, 2 min read

Apple announced new processors for its computers. Here is a table with the transistor count of some recent Apple processors: processor release year transistors Apple A7 2013 1 billions Apple A8 2014 2 billions Apple A9 2015 2 billions Apple A10 2016 3.2 billions Apple A11 2017 4.3…

Converting binary floating-point numbers to integers

, 2 min read

You are given a floating-point number, e.g. a double type in Java or C++. You would like to convert it to an integer type… but only if the conversion is exact. In other words, you want to convert the floating-point number to an integer and check if the result is exact. In C++, you could implement…

Science and Technology links (October 16th 2021)

, 1 min read

The thymus is an important component of our immune system. As we age, the thymus degenerates and our immune system becomes less fit: emotional and physical distress, malnutrition, and opportunistic bacterial and viral infections damage the thymus. New research suggests that practical thymus…

Calling a dynamically compiled function from Go

, 3 min read

Compiled programming languages are typically much faster than interpreted programming language. Indeed, the compilation step produces “machine code” that is ideally suited for the processor. However, most programming languages today do not allow you to change the code you compiled. It means…

Science and Technology links (October 10th 2021)

, 3 min read

Evans and Chu suggest, using data and a theoretical model, that as the number of scientists grow, progress may stagnate. Simply put, in a large field, with many researchers, a few papers and a few people are able to acquire a decisive advantage over newcomers. Large fields allow more inequality.…

For software performance, can you always trust inlining?

, 5 min read

It is easier for an optimizing compiler to spot and eliminate redundant operations if it can operate over a large block of code. Nevertheless, it is still recommended to stick with small functions. Small functions are easier to read and debug. Furthermore, you can often rely on the compiler smartly…

Science and Technology links (October 3rd 2021)

, 1 min read

Most people were able to cure their diabetes by losing weight in a clinical trial. Video games improve intelligence over many years, while socializing has no effect. To go to Mars safely, time is of the essence because astronauts would be exposed to radiations and particles from outside of solar…

Word-aligned Bloom filters

, 6 min read

Programmers often need to ‘filter out’ data. Suppose that you are given a database of users where only a small percentage are ‘paying customers’ (say 5% or less). You can write an SQL query to check whether a given user is indeed a paying customer, but it might require a round trip to your…

Working in virtual reality

, 4 min read

Inspired by a post by Paul Tomlinson, I wrote my last blog post entirely in virtual reality (VR). You put on goggles and see a virtual version of your computer screen wherever you would like. Otherwise, everything around you can be anything you would like. Right now, I am floating in space, I can…

Science and Technology links (September 26th 2021)

, 1 min read

Radiation-therapy can rejuvenate heart cells. (source: Nature) Within members of the same species, cancer risk increases with body size. Large human beings are more at risk than smaller ones. Across species, the reverse is often true: cancer risks decrease with body size. Elephants are much less…

New release of the simdjson library: version 1.0

, 3 min read

The most popular data format on the web is arguably JSON. It is a simple and convenient format. Most web services allow to send and receive data in JSON. Unfortunately, parsing JSON can be time and energy consuming. Back in 2019, we released the simdjson library. It broke speed records and it is…

Science and Technology links (September 18th 2021)

, 1 min read

4.5% of us are psychopaths. U.S. per capita CO2 emissions are lower than they were in 1918. 9/10 of People With Alzheimer’s Lose Some of Their Sense of Smell. Graphene-based hard drives could have ten times the storage capacity. Ageing yields improvements as well as declines across attention…

Random identifiers are poorly compressible

, 2 min read

It is common in data engineering to find that we have too much data. Thus engineers commonly seek compression routines. At the same time, random identifiers are handy. Maybe you have many users or transactions and you want to assign each one of them a unique identifier. It is not uncommon for…

How I debate

, 3 min read

Many of us feel that the current intellectual climate is difficult to bear. When I first noticed the phenomenon, people told me that it was because of Donald Trump. He just made it impossible to debate calmly. But now that Trump is gone, the climate is just as bad and, if nothing else, much…

The big-load anti-pattern

, 6 min read

When doing data engineering, it is common for engineers to want to first load all of the data in memory before processing the data. If you have sufficient memory and the loaded data is not ephemeral or you have small volumes, it is a sensible approach. After all, that is how a spreadsheet typically…