Daniel Lemire's blog

An upcoming revolution in science? The end of academic journals?

, 1 min read

Adam Rogers makes a bold prediction: Eventually, printed journal articles will be quaint artifacts. Scientific papers will be living documents with data published on Web pages – commented on, linked to, and mirrored by labs doing the same work 6,000 miles away. Every research effort will…

Scam Spam, the death of email, and Machine Learning

, 2 min read

Tim Bray has predicted the end of email as we know it: I don’t know about you, but in recent weeks I’ve been hit with high volumes of spam promoting penny stocks. They are elaborately crafted and go through my spam defenses like a hot knife through butter. (…) This could be the straw that…

Taste – Collaborative Filtering for Java

, 1 min read

Here’s yet another Collaborative Filtering library: Taste. This one is written in Java and supports Enterprise Java Beans. Taste is a flexible, fast collaborative filtering engine for Java. The engine takes users’ preferences for items (“tastes”) and returns estimated preferences for other…

Highly Affordable Computing (HAC)

, 1 min read

Slashdot reports that Amazon will let you use a powerful Xeon-based machine for $0.10 per hour. This means that for $10 per hour, you can have 100 machines cranking away on some task. You need to be a Linux user though. That’s what I call Highly Affordable Computing.

Prestige is overrated?

, 1 min read

Grigori Iakovlevitch Perelman proved the longstanding Poincaré conjecture and posted the solution on arXiv. One of the most difficult problems in Mathematics today. However, instead of publishing his work in a prestigious journal, he simply dropped it on an Internet archive. Maybe the Perelman…

Google Scholar launches a “related articles” feature

, 1 min read

If you are a Google Scholar user, you will notice that it now allows you to search for similar articles: do a query and then look for an hyperlink below one of the returned paper. I don’t usually like these fuzzy similarities queries, but sometimes, there is no other way to mine for interesting…

Efficient FIFO/Queue data structure in Python

, 1 min read

For the types of algorithms I implement these days, I need a fast FIFO-like data structure. Actually, I need a double-ended queue. Python has a list type, but it is somewhat a misnomer because its performance characterics are those of a vector. Recently, I found mxQueue which is a separate…

A Tectonic Shift in Global Higher Education

, 1 min read

(…) India, which accounts for a quarter of the developing world’s population and has the third largest higher education system in the world. Today, 23 percent of all higher education enrollments in India are in distance education–specifically in 13 national and state open universities…

Embedding fonts for IEEE

, 1 min read

IEEE requires that your PDF files embed all fonts. If you are including figures, it might prove difficult to embed them. Here is a recipe that works. Start with a file called “ICRA05.pdf”. convert to ps: pdftops ICRA05.pdf convert back to pdf using prepress settings: ps2pdf14…

Google launches online, shareable, spreadsheet tool!

, 1 min read

Google has done it again! Spreadsheets.google.com offers free (as in “no money”) shareable, online spreadsheets. The UI feels a lot like Excel and you can save and load Excel documents. Unfortunately, it does not appear to support the Open Document Format. Unlike Excel, you can easily share…

Get an RSS feed of your favorite researcher

, 1 min read

Want to monitor the publications of a researcher? As long as he submits his papers to arXiv.org and/or Cogprints, you can use citebase to get a RSS feed: enter the author’s name, do a search, then click on the RSS link. ArXiv also has RSS feeds if you are only interested in this particular…