Daniel Lemire's blog

, 3 min read

Compressing document-oriented databases by rewriting your documents

5 thoughts on “Compressing document-oriented databases by rewriting your documents”

  1. @David

    The main drawback, beside implementation complexity, is that it would lower the loading/insert speed.

  2. @David

    If you have short names, it may not automatically save much to replace the name (as a string) by a pointer to the name in a dictionary, and it may even take more space (and more memory). It would certainly introduce a (small) computational overhead.

    So a more reasonable implementation would only use a dictionary for the long names.

    This being said, a clever implementation could end up being superior to what MongoDB currently does.

  3. This seems particularly bizarre as I’d have thought interning your keys was a really easy storage optimisation to do and would basically always be a large win. Any idea why this isn’t done?

  4. Would it really lower the insert speed much? With sensible in memory caching (which is probably free given mongo does everything in memory anyway) the costs of looking up the key would be tiny compared to the cost of writing to disk (and for writes of many objects might win due to less data being written

  5. GDR! says:

    The disk space is not that important, but fitting whole database in RAM makes a huge difference in execution speed.