Daniel Lemire's blog

DeepL is as good as human translators?

, 4 min read

How good is automated translation in 2017? There is a new company called DeepL that seems to have “cracked” the translation problem in the sense that it makes fewer errors than non-professional human translators. That’s my claim, not theirs, but since their system is online for anybody to…

Science and Technology links (September 1st, 2017)

, 3 min read

Richer countries tend to have higher longevity and lower fertility. What is cause and effect? The Hajnal line separates Western Europe from Eastern Europe and it seems like going back centuries ago, people on the West side of the line had lower fertility due to women marrying late or not at all. It…

Parsing comma-separated integers in Java

, 1 min read

We often encounter lists of integers (e.g., “1,2,3,10,1000”) stored in strings. Parsing these strings for the integer values can become a performance bottleneck if you have to scan thousands of those strings. The standard Java approach is to use the Scanner class, as follows: Scanner sc = new…

Quantifying the performance benefits of Go 1.9 on bitsets

, 2 min read

Go, the programming language initiated at Google, has recently shipped its version 1.9. One big change is the introduction of the math/bits package which offers hardware-accelerated functions to manipulate data. When working with bitsets, we often need to count the numbers of 1s in a word. That’s…

Science and Technology links (August 25th, 2017)

, 1 min read

A long journey to reproducible results is a fascinating article in Nature about how incredibly hard it can be to produce results that others can easily reproduce. It turns out that labs can’t even reproduce their own results. They only cover biology, but my experience is that this is even true in…

“Cracking” random number generators (xoroshiro128+)

, 5 min read

In software, we generate random numbers by calling a function called a “random number generator”. Such functions have hidden states, so that repeated calls to the function generate new numbers that appear random. If you know this state, you can predict all future outcomes of the random number…

Testing non-cryptographic random number generators: my results

, 5 min read

In software, we use random number generators to emulate “randomness” in games, simulations, probabilistic algorithms and so on. There are many definitions of what it means to be random, but in practice, what we do is run statistical tests on the output of the random number generators. These…

Science and Technology links (August 18th, 2017)

, 2 min read

We’d like, one day, to transplant pig organs into human beings. Sadly, this is currently very dangerous because, even though pigs are very similar to us, the small differences are cause for concern. Harvard’s George Church and his collaborators have shown that we can use gene editing (CRISPR)…

On Melissa O´Neill´s PCG random number generator

, 4 min read

Computers often need random numbers. Most times, random numbers are not actually random… in the sense that they are the output of a mathematical function that is purely deterministic. And it is not even entirely clear what “really random” would mean. It is not clear that we live in a…

Bubbling up is lowering empathy at a civilization scale

, 6 min read

Computer networks are a fantastic invention. When they came in my life, I remember spending hours, sometimes days, arguing with people I violently disagreed with. At first glance, it looks like a complete waste of time, but experience has taught me that it is tremendously precious. Not because it…

Optimizing polynomial hash functions (Java vs. Swift)

, 3 min read

In software, hash functions are ubiquitous. They map arbitrary pieces of data (strings, arrays, …) to fixed-length integers. They are the key ingredient of hash tables which are how we most commonly implement maps between keys and values (e.g., between someone’s name and someone’s phone…

Science and Technology links (August 11th, 2017)

, 1 min read

It looks like the Java programming language might finally get in-language support for vector instructions, these instructions are supported by modern processors and multiply the processing speed… but they often require different algorithms. Language designers have often ignored vector…

Science and Technology links (August 4th, 2017)

, 1 min read

Lifting a lot of small weights and eating protein regularly builds muscle mass. There is no need for heavy weights and hormones matter less than you think. There is some evidence for life on Saturn’s largest Moon, Titan. If there is life there, it is going to be quite different from life on…

Science and Technology links (July 27th, 2017)

, 1 min read

Currently, damage to the retina is largely viewed as irreversible. However, some researchers were able to generate retinal cells in mice. Toyota is reportedly ready to commercialize, within 5 years, a new type of batteries called “solid state”. They would be lighter than the current lithium…

Science and Technology links (July 21st, 2017)

, 3 min read

Want proof that you live in the future? Ok. There is this “cryptocurrency” called ethereum and it is causing a shortage of microprocessors: Demand from Ethereum miners has created temporary shortages of some of the graphics cards, according to analysts, who cite sold-out products at online…