Practical innovation explains how per-capita wealth increased eightfold during the last century. Yet, we are constantly reminded that we never invent anything new:
Most movies are remake or variations on older movies.
Most research papers are variation on a theme.
Most products and services are…
I carry a pocketbook and a pen everywhere. At night, my pocketbook is by my bed. All creative workers should carry notebooks.
Organizing and collecting ideas are different tasks. My pocketbook is strictly for collection. Every few days, I start a new page: a list of reminders on one side, and…
I heard on radio today that the Christmas break should be used to review the past year, and decide where you want to go. Good idea!
What did I do?
I published the Lemur Bitmap Index C++ Library.
I published lbimproved, a C++ library for Fast Nearest-Neighbor Retrieval under the Dynamic Time…
In his most recent essay, After the credentials, Paul Graham tells us that in South Korea where “college entrance exams determine 70 to 80 percent of a person’s future.” Fortunately, the Americans know better: “Where you go to college still matters, but not like it used to.”
Paul writes…
(See update 2.)
In a recent blog post, I said that parsing simple CSV files could be CPU bound. By parsing, I mean reading the data on disk and copying it into an array. I also strip the field values of spurious white space.
You can find my C++ code on my server.
A reader criticized my…
I am continuing my fun saga to determine whether parsing CSV files is CPU bound or I/O bound. Recall that I posted some C++ code and reported that it took 96 seconds of process time to parse a given 2GB CSV file and just 27 seconds to read the lines without parsing. Preston L. Bannister correctly…
In my post Computing argmax fast in Python, I reported that Python has no builtin function to compute argmax, the position of a maximal value. I provided one such function and asked people to improve my solution. Here are the results:
argmax function
running time
array.index(max(array))
0.1…
(These results were updated.)
In Parsing text files is CPU bound, I claimed that I had a C++ test case proving that parsing CSV files could be CPU bound. By CPU bound, I mean that the overhead of taking each line, finding out where the commas are, and storing the copies of the fields into an array,…
Andre Vellino has just opened his Synthese Recommender System: a recommender for journal articles. Andre works for one of the largest scientific libraries in the world (CISTI). You can read all about his project on his blog.
I often lean on the right politically. The idea that the free market will work is compelling. Free markets may be good at generating some form of wealth, but as we saw on the stock market, this wealth may turn out to be artificial. We have another example of the rule: pure theory is wasteful.
But…
Up until now, it has been difficult for bosses to monitor employees remotely. A friend of ours worked from an office in downtown Montreal. She decided that working from home would be more efficient. Though her boss is conservative, he agreed. She must must be particularly happy this morning…
Computer Science researchers often stress the importance of compression to get better performance. I believe this is a good illustration of an academic bias. Indeed, file size is easy to measure. It is oblivious to Computer and CPU architectures. We even have a beautiful theory that tells you how…
Some years ago, the database research community jumped into XML. Finally, something new to work on! For about 5 years now, I have seen predictions that the XML databases would take the world over. Every organization would soon have its XML database. People would run web sites out of XML databases.…
A common feeling among creative workers is the lack of time. Yet, most people will run out of energy before they run out of time. A single task that takes you 5 minutes (asking a Business Development Officer for Intellectual Property rights) can drain you out for a week. Another task, like…
Among scientists-bloggers, the new buzz word is Mendeley: a social networking platform for scientists (Ricardo Vidal, Sylvie Noël, Misha Lemeshko, Michael Kuhn, …). The site is barely getting started and is still in early beta, there are bugs and limitations. However, the London-based has…
It is impossible to distinguish objectively and systematically bogus work from high quality work. You can sort work based on external attributes such as quality of the presentation, length, logical correctness, prestige of the authors, and methodology, but not on the significance of the work.…
I have been arguing on this blog that while everyone knows diversity is a desirable property of recommender systems, there has been little work on the topic. To make my claim precise, I decided to list the papers addressing both recommender systems and diversity. I mean this list to be…
Daniel Tunkelang comments on the recent progress in collaborative filtering:
(…) the machine learning community, much like the information retrieval community, generally prefers black box approaches, (…) If the goal is to optimize one-shot recommendations, they are probably right. But I…