Daniel Lemire's blog

Write a Twitter application in 5 minutes

, 1 min read

I spend much time alone, writing and thinking. Twitter helps me stay connected. I love the platform. On Friday, I wanted to find the intersection between the users followed by any two individuals. Indeed, suppose that you like both Joe and Jill, and they have similar interests. Maybe whoever they…

Counterintuitive factors determining research productivity

, 1 min read

Permanent researchers publish more when they are in smaller labs. Having many Ph.D. students fails to improve productivity. Funding has little effect on research productivity. Reference: Carayol, N. and Matt, M., Individual and collective determinants of academic scientists’ productivity,…

Working long hours is stupid

, 2 min read

We do too much. We carry too many projects. This overproduction creates problems which we try to fix by working even more. We value most what we create (see Made by hand and The upside of irrationality). To be happy, you want to focus on making interesting stuff. This takes time and dedication. Yet…

How to get everyone talking about your research!

, 1 min read

Deolalikar claims to have solved the famous P versus NP problem. Is the proof correct? Some influential researchers doubt it: Scott Aaronson is betting 200k$ of his own money against Deolalikar. What I find most interesting is that Deolalikar did not submit the paper to a journal, as far as I know.…

Is multiplication slower than addition?

, 1 min read

Earlier, I asked whether integer addition was faster than bitwise exclusive or. My tests showed no difference, and nobody contradicted me. However, everyone knows that multiplication is slower than addition? Right? In cryptography, there are many papers on how to trade multiplications for…

General versus domain intelligence

, 1 min read

Our brains come with hard-wired algorithms. Cats can catch birds or mice without thinking about it. I can grab and eat a strawberry without thinking. The Savanna-IQ Interaction Hypothesis says that general intelligence may originally have evolved as a domain-specific adaptation to deal with…

Summer reading: my recommendations (2010)

, 2 min read

Containment by Christian Cantrell is an excellent sci-fi novel. And you can grab it nearly for free from the author’s page. The premise of the book is that humanity built a colony on Venus. Children are told that Earth cannot be reached. Massive research into economical oxygen production is…

The five most important algorithms?

, 1 min read

Bernhard Koutschan posted a compilation of the most important algorithms. The goal is to determine the 5 most important algorithms. Out of his list, I would select the following five algorithms: Binary search is the first non-trivial algorithm I remember learning. The Fast Fourier transform (FFT)…

NoSQL or NoJoin?

, 2 min read

Several major players built alternatives to conventional database systems: Google created BigTable, Amazon built Dynamo and Facebook initiated Cassandra. There are many other comparable open source initiatives such as CouchDB and MongoDB. These systems are part of a trend called NoSQL because it is…

The fallacy of absolute numbers

, 1 min read

I often come across the following type of arguments in research papers: You could save 3 bits of storage for every value in your database. Surely that’s irrelevant. Nobody cares about saving 3 bits! You can sort arrays in 10 ms. Surely, that cannot be improved upon? You are already down to 10 ms…

Indexing XML

, 1 min read

It is often important to index XPath queries. Not only is XPath useful on its own, but it is also the basis for the FLWOR expressions in XQuery. A typical XPath expression will select only a small fraction of any XML document (such as the value of a particular attribute). Thus, a sensible strategy…

Lack of steady trajectories and failure

, 2 min read

A common advice given out to young researchers is to find a niche. (See Michael’s Branding Your Research). That is certainly good advice. Instead of being another young researcher, you can be the new guy working on topic X. But it always seems to happen no matter what: most Ph.D. thesis address a…

Academic publishing is archaic

, 2 min read

Technological progress tends to increase the available information. Thus, our capacity to manage this information becomes overloaded (hence the term information overload). As Clay Shirky explained: it is not so much an information overload, as a filter failure. The abundance of information is never…

Maximizing your impact as a researcher (guest post)

, 3 min read

The greatest challenge for a researcher is to choose projects that have a good chance of delivering impact. Alain Désilets from NRC—co-author of VoiceGrip, Webitext and the Cross Lingual Wiki Engine—shared his strategies with me: Look at how many workdays per week you can dedicate…

How do we choose research journals?

, 1 min read

The publishing house Elsevier invited me to fill out a survey regarding their journals. As a reward, they gave me a glimpse at their statistics. The three most important considerations when choosing a research journals are (in order) : Speed of review process Standard of reviews Overall reputation…

Computer Science is shallow

, 2 min read

Zed A. Shaw—author of several books on Ruby and Python—came up with an interesting criticism of Computer Science. He makes some good points: Computer Science is a pointless discipline with no culture. (…) They rarely teach deep philosophy and instead would rather either teach you…

Sorting is fast and useful

, 1 min read

I like to sort things. If you should learn one thing about Computer Science is that sorting is fast and useful. Here’s a little example. You want to check quickly whether an integer belongs to a set. Maybe you want to determine whether a userID is valid. The solutions: Use a hash table. Java…

Chinese researchers publish more research papers

, 1 min read

Funding agencies in Canada seek to emulate American funding agencies by promoting excellence. What this means in concrete terms is that few professors get most of the resources whereas the bulk of University professors are left with a pitance or nothing. The intuition behind this more competitive…

Acceptance rate versus impact

, 1 min read

Should you attend the most selective school? Maybe not: Students who attended more selective colleges do not earn more than other students who were accepted and rejected by comparable schools but attended less selective colleges. (Dale and Krueger, Estimating the payoff to attending a more…

Toward data-driven science

, 1 min read

Science and business, so far, have been mostly model driven. That is, you collect a few data points, just enough to fit your model. Then you proceed from your model. However, things have changed: old new Manually take samples of the water in a nearby lake (4 times a year) Setup a wireless…