Daniel Lemire's blog

Open Access: just for articles!

, 2 min read

Many funding agencies and some universities require researchers to publish their articles as open access. That is, research articles must be available to all, freely. The main argument in favor of these policies is social justice: why should publishers acquire the exclusive rights of work funded by…

A recipe for interesting Computer Science research papers

, 2 min read

In Are your research papers telling original stories?, I claimed that the main benefits of the typical research paper were that: the contribution to the state-of-the-art is clear (what did you invent?); we can quickly quantify the value of the contribution (how well does it work?). Basically,…

Do hash tables work in constant time?

, 3 min read

Theory in Computer Science—as in any other field—is based on models. These models make many hidden assumptions. This is one of the fundamental reason why pure theory is wasteful. We must constantly revisit our old assumptions and run experiments to determine whether our models are…

Scientists and their emotions

, 1 min read

Science is not a matter of pure logic. Some of the best scientists lived through intense emotions—it shaped their lifes. Here are a few quotes: Ludwig Boltzman (invented entropy): When he could not reach the standards he set for himself, he would be overcome by feelings of fear, suffering…

Picking a web page at random on the Web

, 1 min read

To do statistics over the Web, we need samples. Thus, I want to know how to pick a Web page at random, without making much effort. If you are Google or Microsoft, it is easy. But what about the rest of us? And what if I want to pick users at random on Facebook? In effect, I want to sample a…

A review of “Hello World: Computer Programming for Kids and Other Beginners”

, 1 min read

I learned programming on my own when I was twelve years old with a TRS-80 and Microsoft Basic. The documentation that came with the TRS-80 was fantastic. Alas, today, no vendor would ever think of including an introduction to programming with a computer. If your are a dad (or a mom) and you regret…

Why I hardly ever blog about my ongoing research

, 2 min read

When I started my blog in 2004, my goal was to blog about my research. It never happened. You may think that I am afraid a reader could steal my ideas, or that I might worry about looking silly. But I have no such fear. However, I am afraid it could hurt the quality of my work. I need a sandbox for…

Netflix game gets exciting: BellKor´s Pragmatic Chaos is passed by The Ensemble

, 1 min read

This is fun. A month ago, I asked whether the Netflix competition was over. After BellKor’s Pragmatic Chaos merged the solutions of many players to break the 10% barrier, I expected them to win. It turns out that another coalition was created—The Ensemble—and they have beaten the…

A few things American academics should know

, 2 min read

I sometimes get annoyed at Americans who seem to think that the rest of the world is modeled after them. Here are some things many American academics seem to take for granted: Professors are paid for 9 months, the rest of their salary comes from eventual research grants. At least in Canada, this…

Determinants of faculty research productivity

, 1 min read

Should you hire Ph.D. graduates from top schools in your country? Maybe not: The present analysis however dispels the notion that graduates of high-status doctoral programs in the discipline of information systems will become superior researchers. (…) The findings indicate that productive…

What FriendFeed got wrong

, 1 min read

Don’t you feel sometimes like your brain is running out of storage space? Myself, I am very forgetful. I always seek new tools to extend my brain. FriendFeed is a fantastic social networking site. It lets you integrate all of your activities from all over the Web into a single flow. You can…

Pedagogy, innovation and convenience

, 1 min read

Organizing learning around courses implies the creation of groups and a tight control by professors. It is convenient to organize students into classes, and grade students by topics. Industry-based economies are similarly convenient. They are hierarchical with a clear reporting structure. However,…

After Netflix? What next?

, 1 min read

The Netflix competition is nearly concluded. We have learned that ensemble methods are the solution for more accuracy. The recommender system community moves on. Immediate questions come to mind: Researchers continue to use the Netflix data set. Will it remain freely available? We need to study…

Design a recommender system: they read your resume

, 1 min read

Goodreads is a Social Web site about books. They need a recommender system. Thus, they issued a challenge: design their recommendation engine, and they will read your resume. I suppose this is the poor man’s version of the Netflix challenge. There is nothing wrong with the challenge, but I wish…

Column stores and row stores: should you care?

, 2 min read

Most database users know row-oriented databases such as Oracle or MySQL. In such engines, the data is organized by rows. Database researcher and guru Michael Stonebraker has been advocating column-oriented databases. The idea is quite simple: by organizing the data into columns, we can compress it…

Is collaboration correlated with productivity?

, 1 min read

Apparently, it is prestigious to write research papers with people from other countries. Funding agencies routinely favor collaboration between different universities. Presumably collaboration improves productivity? Maybe not: (…), there is no clear evidence that correlation exists between the…

Netflix competition is over?

, 1 min read

The Netflix competition is a $1 million research competition to improve the Netflix movie recommender system by 10%. A large team called BellKor’s Pragmatic Chaos just announced that they won (update: unless someone can beat them in the next month). Among them is Yehuda Koren with whom I…

Physical tools to improve research productivity

, 2 min read

Using the right tools can improve your productivity: I use black gel pens with a large to medium point. Right now, I favor uni-ball 207 pens. I always carry a pocketbook. I use it to collect current actionable items. There is exactly one active page at any one time. Once it gets filled up, I move…