Daniel Lemire's blog

How fast can you pipe a large file to a C++ program?

, 2 min read

Under many operating systems, you can send data from from one process to another using ‘pipes’. The term ‘pipe’ is probably used by analogy with plumbing and we often use the the symbol ‘|‘ to represent a pipe (it looks like a vertical pipe). Thus, for example, you can sort a file and…

Science and Technology links (July 31st 2021)

, 2 min read

Researchers built a microscope that might be 10 times better than the best available microscopes. Subsidizing college education can lower earnings due to lower job experience: The Post 9/11 GI Bill (PGIB) is among the largest and most generous college subsidies enacted thus far in the U.S. (…)…

Measuring memory usage: virtual versus real memory

, 3 min read

Software developers are often concerned with the memory usage of their applications, and rightly so. Software that uses too much memory can fail, or be slow. Memory allocation will not work the same way under all systems. However, at a high level, most modern operating systems have virtual memory…

Faster sorted array unions by reducing branches

, 4 min read

When designing an index, a database or a search engine, you frequently need to compute the union of two sorted sets. When I am not using fancy low-level instructions, I have most commonly computed the union of two sorted sets using the following approach: v1 = first value in input 1 v2 =…

Science and Technology links (July 10th 2021)

, 2 min read

We use CRISPR, a state-of-the-art gene editing technique, to edit the genes of live human patients in a clinical trials. A clinical trial has begun regarding an HIV vaccine. If you choose to forgo meat to fight climate change, you may lower your individual’s lifetime warming contribution by 2 to…

Compressing JSON: gzip vs zstd

, 3 min read

JSON is the de facto standard for exchanging data on the Internet. It is relatively simple text format inspired by JavaScript. I say “relatively simple” because you can read and understand the entire JSON specification in minutes. Though JSON is a concise format, it is also better used over a…

Science and Technology links (June 26th 2021)

, 2 min read

Reportedly, half of us own a smartphone. It is often reported that women or visible minority earn less money. However, ugly people are doing comparatively even more poorly. We have highly efficient and cheap solar panels. However, we must also quickly dispose of them after a few years which leads…

How long should you work on a problem ?

, 3 min read

Lev Reyzin says that working too long on a problem might be unproductive: I, personally, have diminishing (or negative?) returns to my creative work as I explicitly work on a problem past some amount of time. I often have insights coming to me out of nowhere while I’m relaxing or enjoying…

Science and Technology links (June 12th 2021)

, 3 min read

We completed the sequencing of the human genome. AstraZeneca’s drug Lynparza cut combined risk of recurrence of breast cancer or death by 42% among women in study. Glycine and N-acetylcysteine supplementation improves muscle strength and cognition. We found Egypt’s ancient capital. It has…

Computing the number of digits of an integer even faster

, 2 min read

I my previous blog post, I documented how one might proceed to compute the number of digits of an integer quickly. E.g., given the integer 999, you want 3 but given the integer 1000, you want 4. It is effectively the integer logarithm in base 10. On computers, you can quickly compute the integer…

Computing the number of digits of an integer quickly

, 3 min read

Suppose I give you an integer. How many decimal digits would you need to write it out? The number ‘100’ takes 3 digits whereas the number ’99’ requires only two. You are effectively trying to compute the integer logarithm in base 10 of the number. I say ‘integer logarithm’ because you…

All models are wrong

, 3 min read

All models are wrong, but some are useful is a common saying in statistics. It does not merely apply to statistics, however. It is general observation. Box (1976) wrote an influential article on the topic. He says that you make progress by successively making models followed by experiments,…

Science and Technology links (May 22nd 2021)

, 4 min read

Most computer chips today in flagship phones and computers use a process based on a 5 nm or larger resolution. Finer resolutions usually translate into lower energy usage and lower heat production. Given that many of our systems are limited by heat or power, finer resolutions lead to higher…

Counting the number of matching characters in two ASCII strings

, 2 min read

Suppose that you give me two ASCII strings having the same number of characters. I wish to compute efficiently the number of matching characters (same position, same character). E.g., the strings ‘012c’ and ‘021c’ have two matching characters (‘0’ and ‘c’). The conventional approach…

Converting binary integers to ASCII characters: Apple M1 vs AMD Zen2

, 3 min read

Programmers often need to write integers as characters. Thus given the 32-bit value 1234, you might need a function that writes the characters 1234. We can use the fact that the ASCII numeral characters are in sequence in the ASCII table: ‘0’+0 is ‘0’, ‘0’+1 is ‘1’ and so forth.…

Science and Technology links (May 15th 2021)

, 2 min read

There were rainforests near the south pole 90 million years ago. Though commercial exchanges are typically win-win for both the buyer and the seller, people tend to view the buyer as more likely to be taken advantage of. People with low self-esteem are more likely to blame the political system for…

Constructing arrays of Boolean values in Java

, 2 min read

It is not uncommon that we need to represent an array of Boolean (true or false) values. There are multiple ways to do it. The most natural way could be to construct an array of booleans (the native Java type). It is likely that when stored in an array, Java uses a byte per value. boolean[] array =…

Science and Technology links (May 1st 2021)

, 2 min read

Growing your own food could lower your carbon footprint by 3-5%. In recent years, we have acquired the ability to measure biological age: your chronological age does not necessarily match your biological since some people age faster. We measure biological aging with gene expression. Researchers…

Ideal divisors: when a division compiles down to just a multiplication

, 2 min read

The division instruction is one of the most expensive instruction in your CPU. Thus optimizing compilers often compile divisions by known constants down to a multiplication followed by a shift. However, in some lucky cases, the compiler does not even need a shift. I call the corresponding divisors…