19th December 2011, 3 min read

Compressing document-oriented databases by rewriting your documents

Daniel Lemire says:

December 19, 2011 at 7:11 pm

@David

The main drawback, beside implementation complexity, is that it would lower the loading/insert speed.
Daniel Lemire says:

December 19, 2011 at 7:40 pm

@David

If you have short names, it may not automatically save much to replace the name (as a string) by a pointer to the name in a dictionary, and it may even take more space (and more memory). It would certainly introduce a (small) computational overhead.

So a more reasonable implementation would only use a dictionary for the long names.

This being said, a clever implementation could end up being superior to what MongoDB currently does.
David R. MacIver says:

December 19, 2011 at 6:44 pm

This seems particularly bizarre as I’d have thought interning your keys was a really easy storage optimisation to do and would basically always be a large win. Any idea why this isn’t done?
David R. MacIver says:

December 19, 2011 at 7:16 pm

Would it really lower the insert speed much? With sensible in memory caching (which is probably free given mongo does everything in memory anyway) the costs of looking up the key would be tiny compared to the cost of writing to disk (and for writes of many objects might win due to less data being written
GDR! says:

December 20, 2011 at 4:48 am

The disk space is not that important, but fitting whole database in RAM makes a huge difference in execution speed.