Daniel Lemire's blog

, 9 min read

Native XML databases: have they taken the world over yet?

10 thoughts on “Native XML databases: have they taken the world over yet?”

  1. Daniel Haran says:

    Those databases came around a time I was starting to throw up XML and XSLT.

    The new hotness – one that’s not just driven by marketing hype – is document dbs. CouchDB looks like it’s on course to let me do things as a developer that none of these legacy vendors are really attempting.

  2. XML and XSLT are fine. I like them both. For some tasks, they are ideal. (And no, they are not good for most things.)

    Document-based databases such as CouchDB and Lotus Notes are indeed very interesting. I am just too cheap and lazy to get a cluster and work with CouchDB.

  3. Daniel Haran says:

    A cluster isn’t stricly necessary to try out CouchDB.

  4. Jo Vermeulen says:

    I don’t know anything about document-based databases, but doesn’t RDF already solve the same problem (i.e. handling schema updates in a more flexible way)?

  5. jason monberg says:

    Mark Logic (I work there) offers an XML database that can be used for free for personal non-commercial projects:
    http://developer.marklogic.com/

    XML databases are ideal for the storage and query of documents where the content and the structure of the document are part of the query. In this case an XML database provides the ability to query documents at an arbitrary granularity across different document schemas.

    An interesting read related to the topic of trends in different types of database systems is the ‘One Size Fits All’ paper from Stonebraker et al:
    http://www.cs.brown.edu/~ugur/fits_all.pdf

  6. @Vermeulen

    RDF is a (flexible) data model, not a database technology. Comparing CouchDB to RDF is apple-to-oranges. RDF does not say anything about indexing, aggregation, querying, updating… it is just a data model. In fact, RDF is not even XML (a common misconception)… it is just often written as XML.

    What something like CouchDB does is to allow you to search and aggregate without *any* top-down schema definition.

    Suppose, for example, that you want to add a new attribute to an existing database, say “cost in Canada”. With a tool like MySQL, this means you must change some table definition. But you cannot allow just any user to do it.

    So your tool is not very flexible. With CouchDB… you are free as a bird.

    But how do they still get fast queries? Ah! There is the magic!

  7. Jo Vermeulen says:

    @Daniel Thanks for the detailed explanation!

    I was actually referring to RDF mapped into a relational DB to allow for more flexible schemas. But I’m not at all sure if this will really work, and what will be the performance implications. See: http://www.rdfabout.com/comparisons.xpd#versus-rdbms and http://infolab.stanford.edu/~melnik/rdf/db.html for how RDF could be mapped to a relational DB.

    CouchDB certainly seems interesting, I should have a better look at it.

  8. @Vermeulen Ok. But still, RDF is at the model level. Something like CouchDB is really at the physical level. (It is actually an implementation of a physical model.)

    I guess you could map a RDF model to CouchDB or to just about any database engine. As far as I can see, any database able to represent a 3-column table can be used with RDF.

  9. Jo Vermeulen says:

    I see. I often just use RDF as a distributed data store whose schema can evolve easily 🙂 Heavy inferencing is usually too slow on mobile devices anyway. Maybe CouchDB can then be an alternative for this particular use case.

    I believe so as well, representing (subject predicate object) triples is all you need.

  10. Dave Kellogg says:

    Hi Daniel,

    Having worked at an object database company (Versant) and an XML database company (Mark Logic), I believe that things are different this time.

    I believe ODBMS failed to achieve broad adoption for two reasons: (1) the RDBMS itself was just being adopted so the timing was too early, (2) the primary value of an ODBMS was in easy persistence of C++ objects that could be worked around with about 15% more effort to map them relationally.

    I think XML databases are different in a few respects. (1) ODBMS were DBMS i.e., they focused on D, data. Successful XML databases (e.g., MarkLogic) focus on content (i.e., documents). The data/document divide is real and there is a bigger gap that’s harder for the RDBMSs to simply absorb. Sure they can stuff XML in columsn, but can they search large amounts of it effectively? No yet.

    (2) XML databases are emerging at a time of general specialization in the DBMS market.

    Think Teradata (data warehouse), Netezza (DW), Streambase (streams), MarkLogic (XML), Vertica (columns), and arguably even BigTable (parallelization) as many different types of DBMSs that are emerging.

    So the idea isn’t as simple as one new type of DBMS will replace the RDBMS. The RDBMS, slowly and over time, will be replaced by a family of specialized ones.