Daniel Lemire's blog

So, you think you are a big shot?

, 1 min read

According to an anonymous friend of mine, the University of California at Riverside reported having received 1700 applications for one faculty position. I hope they have some kind of smart text mining application, because sorting out 1700 applications ought to be a hard task. I submit to you that…

The Power of the Marginal

, 3 min read

I sometimes disagree quite a bit with Paul Graham, but The Power of the Marginal is a brilliant essay. Mostly, I think that Paul is a great observer, but he sometimes goes a bit too far into drawing conclusions. For example, he concluded that Europe could never match the USA economically because…

Babylon 5 is back

, 1 min read

It seems like Babylon 5 is coming back. New episodes will be released on DVD. I know I’ll be preordering them as soon as possible. Joseph Michael Straczynski (JMS) was the first, as far as I know, to produce real science fiction for adults on television. Here’s what he had to say…

Theoretical Computer Science is Closed Minded?

, 2 min read

The Theoretical Computer Science community is raging following an article by Wegner and Goldin. Here’s the gist of what the paper says: The (Theoretical Computer Science) TCS model is inaccurate because (Turing Machines) TMs express only closed-box functional transformation of input to…

Yahoo and MSN cannot compete?

, 1 min read

According to Greg Linden, despite their best efforts, Yahoo and MSN keep losing the search war against Google. What is the problem at Yahoo and MSN? After four years of trying, they just seem to be slipping further and further behind. First, MSN showed a drop in web search market share, down to…

Some summer pictures

, 1 min read

Here is a picture of Louka and Lohan at the (Granby) zoo: Don’t they look alike? Later, that same day, Nathalie went swimming with Lohan, I have a pretty good zoom on my digital camera and I was able to catch this picture of them: While the picture is a bit fuzzy, I really like the look on…

Suresh says we don´t need publication counts

, 1 min read

Suresh points out that Richard Feynman wrote only 37 research papers. I entirely agree with what Suresh implies. To be fair, the main Canadian science funding agency (NSERC), while it asks for your publication list, actually asks you first what your top 5 contributions are. The concept of a…

Migration from CVS to Subversion

, 2 min read

For those who don’t know, email is not a good collaborative editing tool. There are many superior alternatives such as wikis, version control tools, and so on. I tend to use all of those, but for serious work, when I need to actually sit down and write the paper, I use version control (such as…

How to fix pango fonts problems

, 1 min read

I recently updated my Mandrake Linux machine and got errors like this one when starting a GTK application: (gnome_segv:24830): Pango-CRITICAL **: _pango_engine_shape_shape: assertion `PANGO_IS_FONT (font)' failed The net result is that all GTK-based applications (Firefox, Gnumeric, Gimp, and so…

Perfect Hashing

, 1 min read

Suppose you could build a collision-free hash table, how fast would it be? It would be extremely fast, almost as fast as looking up data in an array. As it turns out, collision-free hash tables have been possible for quite something and that’s called perfect hashing. See for example GNU gperf,…

Some interesting KDD 2006 papers

, 1 min read

Here are some papers with sounding interesting titles from the list of ACM KDD papers: Global Distance-Based Segmentation of Trajectories Aris Anagnostopoulos, Michail Vlachos, Marios Hadjieleftheriou, Eamonn Keogh, Philip Yu Rule Interestingness Analysis Using OLAP Operations Bing Liu, Kaidi…

Olivier Bousquet at Curves and Surfaces 2006: Learning on Manifolds

, 3 min read

Yesterday, I attended Olivier‘s talk at the Curves and Surfaces conference. Olivier is a fellow blogger and researcher. Alas, I was too tired after an afternoon of talks and went to sleep, so I did not hunt Olivier. In any case, he presented one of the most interesting talk so far in the…

Geometric Wavelets

, 1 min read

Ah! Just learned about Geometric Wavelets. Most recent work on wavelets as been very uninspiring to me, but this is pretty good. Seems it has been around for a number of years (most recent paper I can see is Le Pennec and Mallat in 2000). The idea is simple, but I only saw one talk about it, so…

Text Mining ICML 2006 Tutorial Slides

, 1 min read

This is too good not to pass on. Ronen Feldman posted slides from his ICML 2006 Tutorial on Text Mining. In this tutorial we will present the general theory of Information Extraction and will demonstrate several systems that use these principles to enable interactive exploration of large textual…