Daniel Lemire's blog

What is a good University?

, 1 min read

Seth Godin wrote a devastating post on the future of higher education. Unlike Godin, I fail to see an imminent crash of high education. But then, I failed to predict the recent financial market crash. However, as someone who spent most of his adult life on a campus, I have an idea of what students…

The mythical reproducibility of science

, 2 min read

David Donoho was among the first researchers to promote reproducible research through software publication (see Buckheit and Donoho, 1995). Fifteen years later, Donoho and his collaborators are even more insistent : Scientific computation is emerging as absolutely central to the scientific method.…

On the design of design

, 1 min read

Following a blog post by John D. Cook, I started reading Fred Brooks‘ latest book. Brooks is famous, among other things, for his earlier book, the Mythical Man-Month. The book is really a collection of essays, organized like blog posts. It is really engaging. I had never read about design per se,…

What I like about my job

, 2 min read

I’m currently a tenured professor with research grants and graduate students. Yesterday, I decided to list attributes of my job that I liked, in no particular order: I have the best computer gear money can buy; I spend most of my time thinking and writing; I have no immediate financial…

Are there too many Ph.D.´s?

, 1 min read

Would you accept work designing mass destruction weapons? Back when a was in college, one my most memorable philosophy assignment was a rebuttal to the claim that scientists working on weapons of mass destruction were responsible for the creation of the weapons. As intellectuals, and scientists, do…

External-Memory Sorting in Java : the First Release

, 1 min read

In my previous post, you were invited to help with a reference implementation of external sorting in Java. Several people tested and improved the code. I like the result. I posted the code on Google code. All contributors are owners of the project. The source code is under subversion. I have…

External-Memory Sorting in Java

, 1 min read

Update: this code is now obsolete. Please see the corresponding Github project. Sometimes, you want to sort large file without first loading them into memory. The solution is to use External Sorting. Typically, you divide the files into small blocks, sort each block in RAM, and then merge the…

The paperless campus: still a long way to go

, 1 min read

Today I spent money from a research grant. Here is the process: I grab the form in Excel format. I fill it out. I print the form. I sign it. I give it to a secretary. The secretary gets the chair of my research center to sign it. The form is then sent to accounting, by internal mail (on…

Write good papers: my slides

, 1 min read

I agreed to give a talk to graduate students on how to write good research papers. I have posted the slides of my talk online. What annoys you about research papers? How do you recognize a good research paper? Do you have any advice to share?

So, you know what´s important?

, 2 min read

Most researchers are convinced that their current work is important. Otherwise, they wouldn’t do it. Yet, few of them work on obviously important things like curing cancer or solving world hunger. Rather, they do silly things like prove the Poincaré conjecture. A century to figure out some…

External-memory shuffling in linear time?

, 2 min read

You can sort large files while using little memory. The Unix sort tool is a widely available implementation of this idea. Files are written to disk sequentially, without random access. Thus, you can also sort variable-length records, such as lines of text. What about shuffling? Using the…

Which is faster: integer addition or XOR?

, 1 min read

The bitwise exclusive or (e.g., 1110 XOR 1001 = 0111) looks simpler to compute than integer addition (e.g., 2 + 9 = 11). Some research articles claim that XOR is faster. It appears to be Computer Science folklore. But is it true? Which line runs faster? (The symbol “^” is the XOR.) for(int k =…

Language, Mathematics and Programming

, 3 min read

Even if you have extensive training in Mathematics, the average Mathematics paper is undistinguishable from the ramblings of a madman. Many of these papers seek to solve narrow problems. And yet, we respect Mathematicians. Software programming is a form of communication, usually between human…

Who the heck got Universities into the email business?

, 1 min read

My current employer, UQAM, refuses to allow email forwarding. Students would rather forward their emails to their existing GMail accounts, for example. And the IT Department (the SITEL) agrees that it would have several benefits. However, they refuse to allow it for the following reasons: Email…

Is programming “technical”?

, 1 min read

According to student evaluations, most of my students appreciate short programming assignments. Yet, every year, some students think that programming is below them or unimportant. Maybe I should start my courses with this theorem: Theorem. If you understand an idea, you can implement it in…

Most common questions about recommender systems…

, 1 min read

I get ten to fifteen questions a week on recommender systems from entrepreneurs and engineers. Sometimes, I help people find their way in the literature. On occasion—for a consulting fee—I get my hands dirty and evaluate, design or code specific algorithms. But mostly, I answer the…

The best software developers are great at Mathematics?

, 1 min read

One of the upsides of working for a university are the stimulating academic discussions. Yesterday, a philosopher challenged me a question: Beyond the fact that software is expressed in Mathematics artefacts (bits, algorithms), are Information Systems fundamentally Mathematical? For my…

Open Sourcing your software hurts your competitiveness as a researcher?

, 2 min read

Almost all software I write for my research is open sourced. Some fellow researcher argued today that I risk reducing the gap between and my pursuers. Similarly, I should keep my data to myself (and avoid listing good sources of research data). Here is my take on this issue. Sharing can’t hurt…

Trading latency for quality in research

, 3 min read

I am not opposed to the Publish or Perish mantra. I am an academic writer. I am what I publish. We all think of researchers as people wearing laboratory coats, working on exotic devices. And my own laboratory includes a one-million-dollar computer cluster with a SAN server as large as a fridge. I…

Where to get your ebooks?

, 2 min read

If you read my blog, you probably like to read in general. Thus, if you don’t own an ebook device, you will soon. The choice is growing: the Amazon Kindle, the Sony Reader, the Apple iPad,… I bought a kindle because my wife won’t let me fill the house with books. And I hate to throw away…