Recently, Bill Gates gave us the main reason for the ongoing revolution in university teaching:
Fortunately for all of you, you’re in a generation where all of these courses are going to be online and basically free. I’m taking solid state physics from MIT, though MIT doesn’t know it. You…
Through Sebastien Paquet, I found a software application called Publish or Perish. It queries Google Scholar and computes statistics for you automagically. It works well. Linux and Windows version available. The Windows version runs under MacOS if you have wine.
Google is getting in the health records business. What happens when a single company has full access to your emails, your videos, your family pictures and your health records?
Abuses are possible, but I predict that not much will happen. The American NSA is recording and mining a large fraction of…
With Kamel and Owen, I am working on a paper involving database indexes. We had over a terabyte of space, and yet, in the middle of the production of the paper, we ran out of space. Only a year ago, I thought that one terabyte was large.
So, I ask our technician about getting a new drive. He comes…
André Vellino will give a talk on recommender systems in our offices (100 Sherbrooke West, room 2720) at 12:30pm this Thursday (February 21st 2008).
Recommender systems for scientific digital libraries that have been the subject of experiments in recent years have used corpora that are primarily…
We need to shuffle the lines in very large variable-length-record flat files.
We can load the files in MySQL and do “select * from mytable order by rand().”
However, loading the data in a DBMS and dumping it out is cumbersome. So, we do an in-memory shuffle block by block. It comes close to a…
Peter argued that reusability and originality are the primary qualities of a research result.
I can tell something is not original if it is looks similar to previous work.
When reviewing a paper, it might difficult to determine if the research result is reusable. Nevertheless, here are some…
Fernando Diaz — an Information Retrieval Researcher from Yahoo! labs in Montreal — sent me this job offer. I had no idea Yahoo! had researchers in Montreal! I feel better about my home town!
Note: Do not get in touch with me regarding this position. I am just reposting it.
Machine Learning /…
At my school, the dean of the Science Faculty claims that we should see a surge of enrollment in Computer Science given the current shortages in Information Technology workers. I have my idea on who is feeding him this information, but I believe it is nonsense.
First, I do not believe there is a…
You can build an effective recommender system with as little as two people.
As you have more users, you tend to have more training data. Hence, you may have more accurate recommendations.
More accurate recommendations may not be important to your users.- The exact count of your users may not…
I have written that solid-state memory drives (SSD) — as found in recent laptops such as the MacBook Air — nearly bridge the gap between internal and external memory. Indeed, we went from 3 orders of magnitude to 1 order of magnitude of difference between disk and RAM!
There is a catch…
A CAPTCHA is a type of challenge-response test used in computing to determine whether a user is human. Yahoo! is having major difficulties with its CAPTCHAs. Russian hackers are able to pass their Turing tests with 35% accuracy. Some human beings say that their accuracy is 80% on these same…
Geoff cites an article by Jaron Lanier arguing that closed-source software is the source of innovation, that open source software is only polishing copies. The gist of the argument is there:
Why are so many of the more sophisticated examples of code in the online world—like the page-rank…
W3C just published today a first draft of HTML 5. HTML 5 replaces HTML 4 and XHTML 1.
They are getting rid of the “acronym” elements because it was rarely used.
The elements “canvas,” “video”, “audio” are added: the HTML becomes fully multimedia. However, MathML and SVG remain…
There is a really nice article on StorageMojo about Cloud Computing. Cloud Computing is more or less the idea that you can offload your storage and processing tasks to a very large set of computers, typically maintained by some large company (such as Amazon). The novelty is that you abstract out…
WikiCFP is a tool to track call for papers collaboratively using a wiki. The call for papers are entered in categories: you can follow only the Machine Learning, Natural Language Processing, or databases call for papers. You can subscribe to RSS feeds for each category.
What a good idea!
Many real-life data sets have power laws or Zipfian distributions. An integer-valued random variable X follows a power law with parameter a if P(X = k) is proportional to k–a. Panos asked what the sum of two power laws was. He cites Wilke at al. who claim that the sum of two power laws X and Y…
Many democratic systems require vote diversity. You do not get elected prime minister of Canada by rallying the largest number of voters. You also need to have your votes spread out over several regions.
Similarly, Scott Karp argues that completely open social networks fail. He takes two examples:…