Earlier this month, Michael Mitzenmacher told us about the record number of students attending his Harvard class online-only. Yesterday, Dick Lipton predicted that online learning will replace campus learning : “I see no reason that On [Online Universities] could not do as good a job as Un…
Many consider Frank Hebert’s Dune the most important work of science-fiction ever written. Consider that Star Wars is just a variation on Dune. Yet, it was rejected by more than twenty publishers, before being finally published. It is likely that publishers rejected Dune precisely because it was…
Researchers—at least in Computer Science—spend most of their days at a desk typing. Picking the right software for writing is important.
Most of my writing time is spent on LaTeX documents. I have tried typical Word processors in the past, but they get in my way. Indeed, by mixing…
Physics works with fundamental properties such as mass, speed, acceleration, energy, and so on. Quantum mechanics has a well known trade-off between position and momentum: you can know where I am, or how fast I am going, but not both at the same time.
Algorithms (and their implementations) also…
I usually stick with academic or research issues, but today, I wanted to have some fun. Geek fun.
While W3C describes Cascading Style Sheets (CSS) as a mechanism for adding style (e.g. fonts, colors, spacing) to Web documents, it is also a bona fide programming language. In fact, it is one of the…
In the late sixties and seventies, we wanted universities to become more accessible. We founded the Open University, the Université du Québec, and many other universities with accessibility as part of their mandate.
The stated goal was to make degrees more accessible. We succeeded.
Yet, we are…
I started 2009 with an interest in Web 2.0 OLAP and collaborative data processing. The field of collaborative data processing has progressed tremendously. Last year, we got Google Fusion Tables and data warehousing products are getting more collaborative.
In 2010, my research might focus more on…
As year 2009 comes to an end, I selected a few of my best blog posts.
Database, compression and column stores:
More database compression means more speed? Right?
Trading compression for speed with vectorization
Column stores and row stores: should you care?
Changing your perspective: horizontal,…
Microprocessors and storage devices are subject to the second law of thermodynamics: using them turn usable energy (oil, hydrogen) into unusable energy (heat). Data centers are already limited by their power usage and heat production. Moreover, many new devices need to operate for a long time with…
In Run-length encoding (part 1), I presented the various run-length encoding formats. In part 2, I discussed the coding of the counters. In this third part, I want to discuss the ordering of the elements.
Indeed, the compression efficiency of run-length encoding depends on the ordering. For…
The debacle of the leaked emails, data and code from the University of East Anglia showed that reputed global warming scientists were petty and cheaters. As always, the pursuit of excellence is often at the expense of rigor.
To put a stop to growing skepticism, Scientific American published Seven…
(This is a follow-up to my previous blog post, there is also a follow-up: part 3.)
Any run-length encoding requires you to store the number of repetitions. In my example, AAABBBBBZWWK becomes 3A-5B-1Z-2W-1K, we must store 5 counters (3,5,1,2,1) and 5 characters.
Storing counters using a fixed…
(This is part 1, there is also a part 2 and a part 3.)
Run-length encoding (RLE) is probably the most important and fundamental string compression technique. Countless multimedia formats and protocols use one form or RLE compression or another.
RLE is also deceptively simple. It represents repeated…
Current practical database compression techniques stress speed over compression:
Vectorwise is using Super-scalar RAM-CPU cache compression which includes a carefully implemented dictionary coder.
C-store—and presumably Vertica—is using similar compression techniques as well as…
Morteza Zaker sent me pointer to their work comparing bitmap indexes and B-trees in the Oracle database. They examine the folklore surrounding bitmap indexes—which are often thought to be mostly useful over low cardinality columns (columns having few distinct values, such as gender). Their…
Procrastination can be a serious problem leading to job loss, high anxiety and even significant psychological disability and dysfunction (according to wikipedia). To avoid excessive procrastination, most researchers grow a sense of professional urgency.
Most people rely on extrinsic pressures. In…
I just finished Saturn’s children. This is my third Charles Stross novel after Accelerando and Glasshouse. Saturn’s children presents itself as a light space opera novel. The hero is a robot-sex-slave who is running for her life, in a post-human world. The author does a great job of making…
University of Toronto (where I got my B.Sc. and M.Sc.)
University of Alberta
University of British Columbia
Université de Montréal (where I got my Ph.D.)
McGill University
McMaster University
Université Laval
University of Ottawa
University of Calgary
University of Western Ontario
University of…
Our global knowledge grows in slow, incremental steps. Darwin and Einstein mostly reinterpreted existing ideas. However, practical implementations sometimes take the world by storm. You might think that the experts are responsible for changing the world. Unfortunately, experts are not good at…
When I asked the director of a large—and successful—British software house his most serious problem, he said without hesitation “how to prevent clusters of incompetence from emerging”. I was reminded of that when I noticed the—for me unusual—weight given to the “peer review”. What,…