Daniel Lemire's blog

My blog got hacked

, 1 min read

Google just told me my blog is being taken out of their index. I investigated. Spammers took over my blog footer, inserting a large set of invisible hyperlinks. They did even worse: they created a large set of PHP files (hundreds!) that were serving spam from my blog without my knowledge. I…

If you claim high scalability…

, 1 min read

I just reviewed a paper where the authors come up with a nice highly scalable algorithm. And it is really scalable too! But to prove just how fast it is, they process 2,000 data points. This is correct, strictly speaking. Their algorithm runs in O(n) time, so to know how long it would take to…

Proceedings of the Large-Scale Recommender Systems workshop

, 1 min read

We have made available a PDF copy of the proceedings for the second Netflix/Large-Scale KDD Recommender workshop. It includes the following papers: Jinlong Wu and Tiejun Li. A Modified Fuzzy C-Means Algorithm For Collaborative Filtering Gavin Potter. Putting the collaborator back into…

The insane world of academic publishing

, 1 min read

Stephen Few few wrote a post on how insane academic publishing is. If you publish academic papers, his post is worth your time. Don’t miss the comments! Stephen is not in academia. From his point of view, what is required of him makes no sense: While he does not expect to get paid for publishing…

Cool software design insight #4

, 1 min read

Mathematicians and philosophers often make terrible programmers. They also tend to write gibberish even in English. (Ok, I do not know if it is a fact, but stay with me.) A terrible way of programming is to try to hold the entire problem in your head and to put it into code in one shot. Why?…

How to select even or odd rows in a table using CSS 3

, 1 min read

CSS 3 is around the corner. Already we are seeing some benefits. The latest versions of Safari and Opera, as well as the beta version of Firefox allow you to select even or odd rows in a table using only CSS: tr:nth-child(2n+1) { background-color: blue; } tr:nth-child(2n) { background-color:…

Peer review is an honor-based system

, 2 min read

It would take too long to expose all of the flaws of peer review, here are some: some work is just flat wrong because the reviewers cannot analyze all of the mathematical results, and because they cannot redo the experiments; numerous researchers cheat, sometimes in small ways (“2 out of 3…

Quick CSS quiz

, 1 min read

Given these CSS instructions, z[x] > a[i] {color: blue;} z z[x] a {text-decoration: underline;} z > z a , z z z + a { color: red ;} what will be the color of the text in the following XML file? <?xml version="1.0" encoding="ISO-8859-1" ?> <?xml-stylesheet…

The mythical bitmap index

, 3 min read

A bitmap index is a popular data structure to speed up the retrieval of matching rows in a table. (It is also used in Information Retrieval, to retrieve matching words.) Building a bitmap index is not hard. You match each possible value with a vector of bits. When the value occur, you insert the…

The secret to intellectual productivity

, 1 min read

There is a simple secret that anyone can apply right now. It can help you get better grades if you are students, it can help you finish that book. I should really charge you for this secret, but I am not good at capitalism. The secret is this: break down your task into small and easy chunks of…

Back from vacations

, 2 min read

I took some time off this year. No, we did not go anywhere specific. I just took two weeks off with my two sons. We had fun. It has been years since I took time totally off. For irrational reasons, I would always keep my research going at a reduced rate during my time off. Not this year! I was 100%…

Cool software design insight #3

, 1 min read

In a comment to my unit testing post, David suggested using property testing. Languages like Java, C and C++ have formalized this very idea as assert instructions. Other languages have the equivalent under different names. You can also manually implement asserts by throwing exceptions or logging…

Submit your papers where they are likely to be accepted

, 1 min read

It came to me tonight. A simple idea, really. All scientists should submit their papers where they are likely to be accepted. Oh! We all need a critical review. Sometimes we even need to be told to rewrite our paper and come back. However, scientists should not play Russian roulette. Sending a…

Cool software design insight #2

, 1 min read

The number 1 difference between an experienced hacker and the random guy out of school is unit testing. Unit testing is simple. Anyone can do it. You do not need a sophisticated library. All you need is to run a program that does sanity checks over the different components of your software. The…

A good word for Nintendo repair services

, 2 min read

A few hours a month, I like to play video games. I never get very far because I lack when teenagers have in abundance: time. However, I make it up by complaining that in my time, we had 4 colors—when we had colors at all—and a 80×25-pixel resolution, and we programmed our games ourselves in…

Given up on Eclipse, now with NetBeans

, 2 min read

I write most of my code using vim. This winter, Kamel made me discover Eclipse. I dislike IDEs in general because they have a tendency to force me to work in certain ways that are suboptimal. For example, if I need to remember to go to menu X and set option Y to build my project correctly, then…