Daniel Lemire's blog

Tim Bray on solving the economic crisis

, 1 min read

For reasons I will not go into, this quote feels very satisfying today: Solution to economic crisis: sack everyone who has an MBA. (Tim Bray)

How to speed up retrieval without any index?

, 3 min read

John Cook gives us a nice recipe to quickly find all squares in a set of integers. For example, given 3, 4, 9, 15, you want your algorithm to identify 4 and 9 as squares. The naïve way to solve this problem goes as follows: For each element… check whether sqrt(x) is an integer. This may…

Why am I not working on world hunger?

, 2 min read

My wife sometimes asks me why I am not working on important problems like world hunger. Instead, I am one of the top world expert in tag-cloud drawing. I am sure she thinks that I just fool around, faking serious research. I actually take my research very seriously. I like to distinguish abstract…

Is what I do technical?

, 2 min read

We are trying to design a master degree in Information Technology. To me, this sort of program should be a professional master degree, that is, it does not lead naturally to a research career or a Ph.D. My business colleagues argue in favour of research methodology courses. Apparently, students…

Full text search in SQL with LuSql

, 1 min read

MySQL supports natively full text search; many database engines do. However, few databases can match a dedicated search engine library like Lucene. Moreover, even if you do not need the power of Lucene, sometimes you are forced to use a database engine that does not support full text search (like…

SciFi book review: Spin by Robert Charles Wilson

, 2 min read

The novel Spin won the Hugo Award for Best Novel in 2006. It is what I would call a “temporal disparity” novel. Earth becomes suddenly surrounded in a temporal shield that slows time down for human beings. Alas, the Sun is aging very fast for the poor human beings. Are we going to die? Who is…

The most active blogs I follow…

, 1 min read

A very active feed that has remained in my list for a long time is a good feed (for me). My top 3 (in decreasing order of activity): The Noisy Channel: Daniel Tunkelang, chief Scientist at Endeca. He works in information retrieval. Population of One: Sylvie Noël, research scientist at the…

Do not trust financial experts

, 1 min read

One expert predicted the recession. He was ridiculed. Watch and draw your conclusions.

Measuring the diversity of recommended lists, at last

, 2 min read

For a number of years, algorithm researchers in collaborative filtering and recommender systems have focused on accuracy as the sole performance metric. Imagine that you bought a couple of albums from Celine Dion and you liked them a lot. Then the best answer might be to suggest you buy all the…

So, you think academic peer review works?

, 1 min read

If you think peer review is sane, consider this example: El Naschie is editor in chief of the journal Chaos, Solitons and Fractals. This journal is published by Elsevier, one of the biggest players in the science publishing business. But here’s where things get interesting: this journal also…

Toward the Commoditization of Natural Language Processing

, 1 min read

In a remarkable paper, Peter Turney shows that using a simple family of algorithms and freely available software, one can determine analogies, synonyms, antonyms, and relations between words automatically. Here is the beginning of the abstract: Recognizing analogies, synonyms, antonyms, and…

Leaves in the web of knowledge

, 1 min read

Sometimes people conclude that I am very humble or just not very smart when I state that most of my work is not very important. In truth, a couple of things I did as a researcher are worth considering, and I hope to produce a few more, but these are small gems in a vast underground of dirt. It…

To improve your indexes: sort your tables!

, 1 min read

Many database indexes, including bitmap indexes, are sensitive to the order of the rows in your table. Many data warehousing practitioners urge you to sort your tables to get better results, especially with Oracle systems. In fact, column-oriented database systems like Vertica are built on sorted…

Understanding what makes database indexes work

, 3 min read

Why do database indexes work? In a previous post, I explained that only two factors make indexing possible: your index expects specific queries or you make specific assumptions about the data sets. In other cases, you are better off just scanning the entire data set. What makes database indexes…

DUAT: Do not use acronyms in titles

, 1 min read

If you submit a paper called “TIA in SAAS” to an international conference, your acceptance probability is low. Note: Out of respect for the actual authors, I have changed slightly the title.