Daniel Lemire's blog

Multidimensional OLAP Server for Linux as Open Source Software

, 1 min read

Jedox will release a free open source Linux MOLAP server by the end of the year. A pre-release of the software is expected by mid of 2005. All data is stored entirely in memory. Data can not only be read from but also written back to the cubes. Like in a spreadsheet, all calculations and…

How do people balance out precision and recall?

, 1 min read

In Information Retrieval, you can’t have both great recall and great precision, so you have to balance the two. What are the possible criteria to pick the best recall/precision? What I found so far, on wikipedia of all places, is the so-called F-measure or balanced F-score, and it is merely the…

Java Serialization is not for long term storage

, 1 min read

Using Serialization for long term storage, is a common mistake. In fact, Microsoft made with with Microsoft Word and it is a well known source of trouble (ever had a corrupted file you could not recover from?). Serialization in Java was never advertized as a viable storage long term mechanism. We…

Flattening lists in Python

, 1 min read

Can anyone do better than this ugly hack? def flatten(x): flat = True ans = [] for i in x: if ( i.class is list): ans = flatten(i) else: ans.append(i) return ans Update. I like this solution proposed by one of the commenters (sweavo): def flatten(l): if isinstance(l,list): return…

Harold and RuleML

, 1 min read

Harold Boley was over in Montreal yesterday. He gave a talk on RuleML. The big news, to me, is that RuleML has been modularized into sublanguages. Of particular interest to me was their DataLog sublanguage (and they have a tutorial about it). To be honest, I didn’t even know what “DataLog”…

Google Data APIs

, 1 min read

I wish I understood the point of this Google Data APIs. The Google data APIs (“GData” for short) provide a simple standard protocol for reading and writing data on the web. GData combines common XML-based syndication formats (Atom and RSS) with a feed-publishing system based on the Atom…

Stallman does it again

, 1 min read

And yes, I store this under “open source” not “free”. Come and get me Richard!

When XML abstraction kills your web services

, 1 min read

The writting has been on the wall for quite some time. Dare Obasanjo comments on the misguided efforts of the W3C’s XML Schema Patterns for Databinding Working Group: The core problem is that every vendor of XML Web Services toolkits pretends they are selling a toolkit for programming with…

Wink: free tool to generate Flash or PDF from your screencasts

, 1 min read

Parand points to this cool, free, open source, multiplatform (Windows/Linux) tool called Wink to create screencasts in Flash or PDF. Wink is a Tutorial and Presentation creation software, primarily aimed at creating tutorials on how to use software (like a tutor for MS-Word/Excel etc). Using Wink…

Flamenco Search Interface Project

, 1 min read

I just found out about the Flamenco Search Interface Project: The Flamenco search interface framework has the primary design goal of allowing users to move through large information spaces in a flexible manner without feeling lost. A key property of the interface is the explicit exposure of…

See who blogged about this page

, 1 min read

Matthew Hurst shares this cool bookmarklet to help you find out quickly who blogged about the page your browsing now. If you’ve never used a bookmarklet before, it is quite easy: just drag the URL to your favorites and click on it while browsing a page.

The Generic Mapping Tools (GMT)

, 1 min read

The Generic Mapping Tools is an open source collection of ~60 tools for manipulating geographic and Cartesian data sets (including filtering, trend fitting, gridding, projecting, etc.) and producing Encapsulated PostScript File (EPS) illustrations ranging from simple x-y plots via contour maps to…