Daniel Lemire's blog

Some useful regular expressions for programmers

, 2 min read

In my blog post, My programming setup, I stressed how important regular expressions are to my programming activities. Regular expressions can look intimidating and outright ugly. However, they should not be underestimated. Someone asked for examples of regular expressions that I rely upon. Here a…

A trichotomy of intellectual activity

, 3 min read

I like to separate intellectual work among three categories: Emulation: the reproduction or direct application of existing ideas. Most academic work and maybe most business work falls in this category. You seek the best ideas and you reproduce them, sometimes with minor adaptations. As argued…

Science and Technology links (April 17th 2021)

, 1 min read

Moderna built their COVID 19 vaccine without having the virus on site. They viewed it as a software problem. Human and mice with red hair have elevated pain thresholds. Tumors (cancer) consume high levels of sugar. You would think that it means that cancer cells consume a lot of sugar, but it…

How fast can you sort arrays of integers in Java?

, 2 min read

Programming languages come with sorting functions by default. We can often do much better. For example, Downs has showed that radix sort can greatly surpass default sort functions in C++. Radix sort is you friend if you want to sort large arrays of integers. What about Java? Richard Startin and…

My programming setup

, 6 min read

As my GitHub profile indicates, I program almost every single working day of the year. I program in many different languages such C++, C, Go, Java, JavaScript, Python, R, Swift, Rust, C#; even though I do not master all of these languages. Some of the projects I work on can be considered…

Science and Technology links (March 27th 2021)

, 3 min read

Scientists, including climate-science researchers, often travel to faraway places for conferences. Attending a live conference is time consuming and expensive. The cost is relative: attending a $3000 conference in Hawaii is cheap for the Harvard student, but a considerably higher expense for…

Counting cycles and instructions on the Apple M1 processor

, 5 min read

When benchmarking software, we often start by measuring the time elapsed. If you are benchmarking data bandwidth or latency, it is the right measure. However, if you are benchmarking computational tasks where you avoid disk and network accesses and where you only access a few pages of memory, then…

Apple´s M1 processor and the full 128-bit integer product

, 3 min read

If I multiply two 64-bit integers (having values in [0, 264)), the product requires 128 bits. Intel and AMD processors (x64) can compute the full (128-bit) product of two 64-bit integers using a single instruction (mul). ARM processors, such as those found in your mobile phone, require two…

Science and Technology links (March 6th 2021)

, 1 min read

Increasing schooling does not improve social outcomes at a population level. Venitian glass was made near Venice as early as 450 BC. It spread worldwide through trade. Venetian glass made its way as far as North America. We have now determined that it was present in Alaska before Christopher…

How does your programming language handle “minus zero” (-0.0)?

, 1 min read

The ubiquitous IEEE floating-point standard defines two numbers to represent zero, the positive and the negative zeros. You also have the positive and negative infinity. If you compute the inverse of the positive zero, you get the positive infinity. If you compute the inverse of the negative zero,…

Parsing floating-point numbers really fast in C#

, 2 min read

Programmers often write out numbers as strings (e.g., 3.1416) and they want to read back the numbers from the string. If you read and write JSON or CSV files, you do this work all of the time. Previously, we showed that we could parse floating-point numbers at a gigabyte per second or better in C++…

Science and Technology links (February 13th 2021)

, 1 min read

Researchers make inexpensive transparent wood. Our cells produce energy using their mitochondria. Researchers show that you can efficiently isolate mitochondria from mammalian cells. Vitamin D supplementation could save tens of thousands of lives annually in Germany alone by preventing cancer. It…

On the cost of converting ASCII to UTF-16

, 2 min read

Many programming languages like Java, JavaScript and C# represent strings using UTF-16 by default. In UTF-16, each ‘character’ uses 16 bits. To represent all 1 million unicode characters, some special ‘characters’ can be combined in pairs (surrogate pairs), but for much of the common text,…

Science and Technology links (February 6th 2021)

, 1 min read

You can use artificial intelligence and satellite images to count the number of elphants found in the wild. It appears that a billion people on Earth now use an iPhone. The number would be higher if not for the pandemic. A supplement used by body builders (alpha-ketoglutarate) made old mice…

Number Parsing at a Gigabyte per Second

, 3 min read

Computers typically rely on binary floating-point numbers. Most often they span 64 bits or 32 bits. Many programming languages call them double and float. JavaScript represents all its numbers, by default, with a 64-bit binary floating-point number type. Human beings most of often represent numbers…

Science and Technology links (January 24th 2021)

, 1 min read

Year 2020 was great for PC makers. We are selling more and more PCs. Reportedly, Sony sold 3.4 million PlayStation 5 in only four weeks, a record. The demand for the Facebook Quest 2 VR headset is reportedly several times the demand for the original Quest. Valve, the company that makes the Index…

Science and Technology links (January 16th 2021)

, 1 min read

You can tell people’s political affiliation by image recognition technology. There are far fewer stars and galaxies than we thought. The universe is relatively small. (The source article has been revised with different conclusions.) Dog ownership conferred a 31% risk reduction for cardiovascular…

Science and Technology links (January 9th 2021)

, 1 min read

The Earth is spinning faster and faster: The 28 fastest days on record (since 1960) all occurred in 2020, with Earth completing its revolutions around its axis milliseconds quicker than average. We are soon getting a new Wi-Fi standard called Wi-Fi 6: it supports data transmission at over 1 GB/s,…

Memory access on the Apple M1 processor

, 3 min read

When a program is mostly just accessing memory randomly, a standard cost model is to count the number of distinct random accesses. The general idea is that memory access is much slower than most other computational tasks. Furthermore, the cost model can be extended to count “nearby” memory…

Peer-reviewed papers are getting increasingly boring

, 6 min read

The number of researchers and peer-review publications is growing exponentially. It has been estimated that the number of researchers in the world doubles every 16 years and the number of research outputs is increasing even faster. If you accept that published research papers are an accurate…