Daniel Lemire's blog

It is not where you work, but who you work with

, 2 min read

It is widely believed that intellectual productivity is tied to location. That is, if you work in a basement at Harvard like Walter Bishop in the TV show Fringe, you’ll be far more brilliant than if you do the same work in a FBI laboratory. Of course, for most people it is the access to people…

How database design fails us, and what to do about it

, 1 min read

Good database design is crucial to obtain a sound, consistent database, and — in turn — good database design methodologies are the best way to achieve the right design. These methodologies are taught to most Computer Science undergraduates, as part of any Introduction to database…

True scientists are irreverent

, 2 min read

Richard Hamming compared knowledge to compound interest: The more you know, the more you learn. Hence, progress tends to be exponential. Some innovations increase our rate of progress slightly. The light bulb allows us to work late at night. Some accelerate progress tremendously. Science is one…

Why aren´t we getting richer? The scarring tissue theory

, 4 min read

Bankers will tell you that to get rich, you should rely on compound interests. Save up a little bit of everything you earn, and you will soon be wealthy. What they often fail to mention is that prices may also increase exponentially. Once you deduct this inflation from your gains, you may even end…

Where does innovation come from?

, 3 min read

I just finished Rational Optimist by Matt Ridley. Because I am an overly pessimistic individual, I expected to hate the book. I loved the book. I should point out where I read the book, because context is important in this case. I was in Berlin. My hotel room was about 50 meters away from…

Two 32-bit hash functions from a 64-bit hash function?

, 4 min read

A few years ago, we worked on automatically removing boilerplate text from e-books taken from the Project Gutenberg. In this type of problem, you want to quickly identify whether a line of text is likely part of the boilerplate text. You could build a hash table, but hash tables tend to be use a…

Emerging knowledge is a private business

, 3 min read

Collaboration is often encouraged in science: it is viewed as an intrinsically good thing. Yet there are downsides to collaboration. The most obvious downside is that requirement to coordinate the efforts (e.g., hold meetings). But there are many other downsides to collaboration: Collaboration…

You think that users are faceless objects? You are obsolete!

, 3 min read

IT departments fail us because they are founded on the technocratic imperative. Users are faceless objects for which the system is designed (Iivari et al., 2009). Correspondingly, usability is a secondary feature at best. I challenge you: ask your IT department whether the users are consulted…

Science is self-regulatory… really?

, 3 min read

Many theoretical systems are self-regulatory. For example, in a free market, prices will fluctuate until everyone gets a fair price. But free markets are a mathematical abstraction. The business of science should also be self-regulatory. Scientists who produce bad work should build poor…

Why can´t hash tables preserve the order of keys?

, 3 min read

One of the most common data structuring in Computer Science is the hash table. It is used to store key-value pairs. For example, it is a good data structure to implement a phone directory: given the name of the individual, find his phone number. Implementing a hash table is not difficult. Start…

Linux and the financial crisis

, 1 min read

On December 2007, the New York Stock Exchange adopted Linux. In late August 2008, we saw one of the worse worldwide stock market crash of the last hundred years. This crisis was not predicted by mainstream economists and experts. It took them by surprise and it took weeks or months before an…

Better job ads

, 1 min read

Before writing your next job ad, look at companies successfully recruiting talented engineers. According to a recent Google job posting, here are the requirements to work at Google: BS or MS in Computer Science or equivalent work experience (Ph.D. a plus) Experience programming in C++, Java or…

The Web is killing database systems

, 3 min read

A typical enterprise computing architecture relies on databases, professionally managed by DBAs. Developers grow applications which all update or query the same databases. The value is not in the software per se, but in the data architecture. Given the DNA of our industrial-age organizations, this…

Fast computation of scalar products, and some lessons in optimization

, 3 min read

Given two arrays, say (1,2,3,4) and (4,3,1,5), their scalar product is simply the sum of the products: 1 x 4 + 2 x 3 + 3 x 1 + 4 x 5. It is also known as the inner product or dot product and it is a routine operation in software graphics, database systems and machine learning. Many processors even…

Usury and the collapse of empires

, 3 min read

The American government recently played Russian roulette with its economy by threatening to default on its debt. Of course, nobody actually thought that the Americans would truly default at this time. After all, the Americans can quite literally print dollars to pay their debt. But it is also…

Pick one: determinism or fairness

, 4 min read

Computers changed our life drastically in the last few decades. Correspondingly, I view the world in terms of algorithms. When I think of how the government works, for example, I see over-engineered heuristics. Bureaucracies are akin to spaghetti code. Can you tell me how my tax dollars are spent?…

Scientists and central planning

, 4 min read

Overconfident individuals often win by claiming more resources than they could defend (Johnson and Fowler). If nobody knows who is strongest, whoever thinks he is the strongest might win by default. That is, there is no better way to fool others than to first fool yourself. Accordingly, human…

What the Internet wants me to read (summer 2011)

, 6 min read

Last week, I asked on Twitter, Facebook and Google Plus what I should read over the summer. Here is a quick summary of the recommendations I got: On Twitter: A Beautiful Mind by @communicating War & Peace & War by @janmikkelsen ReWork by @bebraw Understanding Comics by @sclopit The…

Sentience is indescribable

, 2 min read

Arguably, one of the most nagging scientific question is the nature of sentience. Can we build sentient computers? Is my cat sentient? What does that mean? Will a breakthrough in cognitive science tell us what are consciousness, sentience and free will? I conjecture that these topics will forever…

The myth of the unavoidable specialization

, 3 min read

In a recent essay, Malone et al. claimed that we were entering the age of hyperspecialization. Their core assumption: human beings are more efficient when doing specialized tasks. Thus, they claim, we are moving toward a future where software will distribute hyperspecialized tasks to expert…