Daniel Lemire's blog

How quickly can you compute the dot product between two large vectors?

, 2 min read

A dot (or scalar) product is a fairly simple operation that simply sums the many products: float sum = 0; for (size_t i = 0; i < len; i++) { sum += x1[i] * x2[i]; } return sum; It is nevertheless tremendously important. You know these fancy machine learning algorithms we keep hearing about?…

Income, wealth, intelligence and the fall of the American empire

, 1 min read

Arcand‘s latest movie (the Fall of the American Empire) depicts a young man (Pierre-Paul Daoust) who is supposedly very intelligent, but not very wealthy. The movie begins with the character explaining that intelligence actually gets in the way of success. I have certainly met my share of folks…

Predicting the truncated xorshift32* random number generator

, 1 min read

Software programmers need random number generators. For this purpose, the often use functions with outputs that appear random. Gerstmann has a nice post about Better C++ Pseudo Random Number Generator. He investigates the following generator: uint32_t xorshift(uint64_t *m_seed) { uint64_t result…

Science and Technology links (June 29th, 2018)

, 3 min read

Hogarth imagines that artificial intelligence (AI) could progress much faster than we might anticipate due to what he calls “AI nationalism”: I believe that the current government spending on AI is tiny compared to the investment we will see as they come to realise what is at stake. What if…

Data processing on modern hardware

, 5 min read

If you had to design a new database system optimized for the hardware we have today, how would you do it? And what is the new hardware you should care about? This was the topic of a seminar I attended last week in Germany at Dagstuhl. Here are some thoughts: You can try to offload some of the…

Science and Technology links (June 24th, 2018)

, 1 min read

Video gamers may soon be paid more than top pro athletes. Meanwhile, if you want to stand out in a crowd of university professors, point out that you are a fan of videogames. You can get your whole genome sequenced for $500. Machines can learn to solve the Rubik’s Cube “without human…

Roaring Bitmaps in JavaScript

, 2 min read

Roaring bitmaps are a popular data structure to represents sets of integers. Given such sets, you can quickly compute unions, intersections, and so forth. It is a convenient tool when doing data processing. I used to joke that Roaring bitmaps had been implemented in every language (Java, C, Rust,…

Emojis, Java and Strings

, 3 min read

Emojis are funny characters that are becoming increasingly popular. However, they are probably not as simple as you might thing when you are a programmer. For a basis of comparison, let me try to use them in Python 3. I define a string that includes emojis, and then I access the character at index…

Science and Technology links (June 15th, 2018)

, 1 min read

The market for artificial-intelligence chips could reach $30bn by 2022. My guess is that NVIDIA (a graphics card maker) is the big winner, their tag line is now “Artificial Intelligence Computing Leadership”. Multiple sclerosis could be caused by polymicrobial infections.1. There are fungal…

Science and Technology links (June 9th, 2018)

, 2 min read

A woman with late-stage breast cancer has been successfully cured using immunotherapy. She was preparing to die. She is now going to live hopefully many more years in good health. Deep learning generalizes poorly, in at least some cases: (…) when images are scaled to half their original size…

Science and Technology links (June 2nd, 2018)

, 4 min read

Human hearts do not regenerate. Cardiovascular diseases are the leading cause of death in occident. Japanese doctors will graft sheets of tissue derived from reprogrammed stem cells in the hope of healing diseased human hearts. It is a small and uncertain trial.1. The surface of your eyes (cornea)…

Greater speed in memory-bound graph algorithms with just straight C code

, 2 min read

Graph algorithms are often memory bound. When you visit a node, there is no reason to believe that its neighbours are located nearby in memory. In an earlier post, I showed how we could accelerate memory-bound graph algorithms by using software prefetches. We were able to trim a third of the…

Science and Technology links (May 26th, 2018)

, 5 min read

Teicholz argues that Nutrition Science is Not Up to the Task: Despite methodological advances, nutritional epidemiology remains fundamentally limited by its observational nature. Guidelines relying on this circumstantial evidence can be little more than educated guesses. Enacting prevention…

Gender and peer review

, 3 min read

Modern science works in the following manner. You do the research. You write a paper. You publish the paper. For historical reasons, “publishing the paper” typically means “submit it to a committee of your peers and get their seal of approval”. We can rightfully be concerned about the…

Graph algorithms and software prefetching

, 3 min read

A lot of data in the real world can be represented as graphs: you have nodes connected through edges. For example, you are a node in a graph where friendships are edges. I recently met with professor Semih Salihoglu, an expert in graph databases and algorithms. We discussed fun problem like how one…

Science and Technology links (May 18th, 2018)

, 3 min read

How is memory encoded in your brain? If you are like me, you assume that it is encoded in the manner in which your brain cells are connected together. Strong and weak connections between brain cells create memories. Some people think that it is not how memories are encoded. To prove that it is…