Daniel Lemire's blog

What to get with your Nintendo Wii?

, 1 min read

I guess many people are getting Wiis right about now. I have had mine for a few months. Here are my recommendations: The Sims 2: Castaway. You do not need to know what the sims are. If you can cope with games requiring planning, some puzzles, but relatively little action, this game is for you.…

Collaborative Filtering: Why working on static data sets is not enough

, 1 min read

As a scientist, it is important to question your assumptions. So far, most of the hard Computer Science research on collaborative filtering has used static data sets such as Netflix. Specifically, it is assumed that the recommender systems do not impact the ratings and what items get rated. A…

How to win the Netflix $1,000,000 prize?

, 2 min read

Yahuda Koren, one of the winners of the Netflix game so far, was nice enough to send me a pointer to a recent paper he wrote, Chasing $1,000,000: How we won the Netflix progress prize (link is to PDF document, see 4th page and following). Their approach is based on the linear combination of large…

How University professors ought to be teaching…

, 2 min read

I am not a teacher per se. As a professor, I define myself as a researcher first and I do not do research on teaching methodologies. So this makes me poorly qualified to tell the world how a professor ought to be teaching. Nevertheless, I do teach. And I think that some of the time, I teach better…

21 open problems in Artificial Intelligence

, 2 min read

Peter Turney has come up with a list of 21 (important) open problems in the field of Artificial Intelligence. I am not aware of any such list anywhere, so this might be an important contribution. For comparison, Wikipedia as a list of open problems in Computer Science. In the field of database, the…

How many Computer Science researchers are there?

, 1 min read

In current work with do on database indexes, we decided to use DBLP as a data source. Among other things, we use the authors’ name as a dimension. From one plot, I noticed that there must have half a million distinct authors. I doubted this number, and Kamel was nice enough to investigate…

How much are the ideas of your competition worth to you?

, 2 min read

Scientists are typically rather secretive about whatever they are working on right now. While in most universities, you can at least see where the researchers work, in some government laboratories, such as NRC, you would think that Russian spies are on every corner: how else can you explain the…

Why tenure matters?

, 1 min read

Since the end of World War II, at least half of all university professors in North America have tenure: they cannot be dismissed without adequate cause. This job security is earned: you need to be a professor for several years, and to perform well, before you can be granted tenure. At several…

Why having a readership matters

, 1 min read

I recently proposed that scientists should adopt the find a readership or perish motto. (A related goal for engineers might be “find users or perish.”) The goal is certainly not to have as many readers as possible, but having some serious readers matter. I was chatting with Seb Paquet today and…

Netflix: an interesting Machine Learning game, but is it good science?

, 2 min read

The Netflix competition is a $1 million game to build the best possible movie recommender system. It has already contributed to science tremendously by providing the largest freely available collaborative filtering filter data set (about 2GB): it is at least an order of magnitude larger than any…

A better way to browse DBLP: Faceted DBLP

, 1 min read

Scientists are silly sometimes. For example, there is no standard way to figure out what a given researcher has published, nor to find out which papers appeared at a given conference and a given year. DBLP is one tool that tries to solve this problem for Computer Science. It is far from perfect,…

Improving your intellectual productivity by accepting chaos

, 1 min read

One thing I did in 2007 was to ditch my PDA. Instead, I carry a pocket notebook. Almost daily, the best ideas I have jotted down, get organized in document drafts. You must clearly differentiate data and thought gathering from data and thought organization. Organization |Purpose …

Computers can do analogies

, 1 min read

Because I had read Peter‘s papers, I knew that computers could solve analogies. However, seeing it with your eyes is quite impressive. TextRunner can solve analogies online (though the site is a bit slow). (Update: the site is now down.) What does it mean? I asked it “Quebec is to Canada what…

Google Recommends Blogs: Another PageRank?

, 2 min read

Greg, Andre, and many others, have written good things about the Google Reader recommender system: if you read your blogs and your news using Google Reader, Google will recommender other feeds to you based on your profile. I have been a bit more critical. The recommender system they offer is…

For the Web hacker in you: Google Chart API

, 1 min read

I have said it again and again and I will keep on saying it: I am a hacker, a tweaker, a fiddler, and so on. And Google has just come up with one of the most hackable Web API I have seen in years! All it does, essentially, is to allow you to chart data on a Web site, but it does so very nicely…

Formal definitions are less useful than you think

, 2 min read

There is a widely held belief that shared formal definitions improve collaboration. Certainly, most scientists share several unambiguous definitions. For example, there cannot be a disagreement as to what 2+2 is. In crafting a research paper, it is important to keep ambiguities to a minimum. You do…