Daniel Lemire's blog

, 20 min read

Who is going to need a database engine in 2020?

25 thoughts on “Who is going to need a database engine in 2020?”

  1. Steven Shaw says:

    Wondering what language you recommend as alternatives to Java?

    Scala springs to mind of course as an obvious choice for reforming Java devs.

    Could it be something more unusual like Haskell, Erlang, O’Caml, Scheme, Google Go or the likes of Ruby/Python/JS?

  2. Virgilio says:

    I think that the C++ of today has nothing to do comparing with the one that lost The battle agallas Java yeraa ago. It only needs standard libs for standard problems (threads, sockets, …) there are a few good libs like qt, but they Are not standard.
    I believe something like jsr’s for C++ could be a solution…

  3. Zeno says:

    Just as a sidenote, regular expressions are not an esoteric feature, but a standard tool for many programmers (on Unix systems: every programmer).

  4. Glenn Davis says:

    Especially data compression, and in particular content-aware compression methods designed specifically for structured data.

    Database volumes are growing faster than Moore’s law, but the state of the art of database compression has never kept pace. In the absence of systematic methods designed for database records, decades-old, one-dimensional, conventional methods, intended for long-obsolete hardware, are instead being brought to bear ad-hoc on database data.

  5. @Shaw

    Wondering what language you recommend as alternatives to Java?

    I use Python, Java and C++ myself. They are all getting better all the time.

    Scala is interesting, but it feels challenging.

    I won’t make predictions except to say that I expect new and more powerful programming languages to replace the existing ones. I’ll be pretty sad if in 2020, I’m still primarily using Python, Java and C++. There is so much innovation out there that something strong has to emerge out of it.

    For reference, hardly anyone was using Python and Ruby ten years ago. C# didn’t even exist. We are not standing still. (Though some would object that we have never improved over Lisp. I won’t get into this debate.)

    @Zeno

    regular expressions are not an esoteric feature

    No. They are not. I’m sorry I wasn’t clear: it was irony. My point was that 20 years ago, regular expressions would have appeared as an esoteric feature whereas it is now taken for granted. Thus, programmers are much more powerful than they were 20 years ago. A program that had taken months to write can now be written much faster. I conjecture that the trend will continue.

    @Paul

    Even if we switch to RAM storage, the concepts will remain in accessing data across a network.

    Good point.

    I recall a lot of database work being figuring out how to make sure you’re accessing contiguous chunks whenever possible to avoid having to wait around for a spinning platter to get to the next bit you need. Storage advances look to be moving away from that model, which will make writing db code and interacting with them that much easier

    Yes. I agree.

    1. Travis Downs says:

      I’m pretty sure I was writing C# more than ten years ago. I either I’m very clever and invented it for my personal use before MS got around to it, or it was already around in the early 2000s.

      1. My blog post was written in 2010. As far as I can tell, C# came up in the early 2000s. If you were programming in C# before then… something is off.

        1. Travis Downs says:

          Huh, oops! I got fooled by the recent comments. Interesting to look back and see the perspective.

          Are you still using Java, C++ and Python?

          1. Are you still using Java, C++ and Python?

            Yes. I try hard to use as many languages as I can.

  6. Paul says:

    There will always be this shift as “db” problems start fitting in memory and “impossible” problems turn into feasible db problems. Thus there will always be a need for quick, in memory solutions; for general purpose databases; and for specialized solutions. Even if we switch to RAM storage, the concepts will remain in accessing data across a network.

    The one thing I do see changing is the quirks of databases necessitated by spinning disks. From my DB class back in college, I recall a lot of database work being figuring out how to make sure you’re accessing contiguous chunks whenever possible to avoid having to wait around for a spinning platter to get to the next bit you need. Storage advances look to be moving away from that model, which will make writing db code and interacting with them that much easier, when, e.g. you don’t need to pick a primary variable to index on, but can have every variable indexed at identical speeds

  7. Stanley Lee says:

    Even though I’m a bit of a tech-recluse these days, this is a great wake-up call for the over-ambitious idiots in the industry. With that said, which pre-existing database engine did you find superior in general (I saw a few listed on Wikipedia: http://en.wikipedia.org/wiki/Database_engine )

  8. @Stanley

    All the database engines currently listed in the database engine wikipedia article are related to MySQL. Seems a bit biased to me.

    As for a review of database engines, that would make a blog post of its own. Maybe later… 😉

  9. Jack Dempsey says:

    Given the mention of Java and databases, I’m almost surprised to see no one mention Clojure yet. The basics of the STM system should be familiar to many, while the reliance on the JVM should help some move towards it as well.

    I’m only just getting into it now and really liking what I see. Take a look if you haven’t heard of it: http://clojure.org

  10. Antonio Badia says:

    Daniel,
    the title of your post is “Who is going to need a database engine in 2020?” But then in the post you go to talk about a different (albeit related) issue: who would want to write their own database engine? As you point out, that’s kind of crazy (unless you work at Google/Twitter/Amazon/Facebook). There are already many distinct database engines available (SQL and no-SQL, in-memory and in-disk; centralized and distributed). It feels like reinventing the wheel.
    Once this said, I think the original question (title) is much more interesting. I hope you write a post on that. We can discuss why databases are not used for data analysis/analytics.

  11. Just what is “Big Data”?

    I am curious about the performance of the H2 database. Given the preference for running in-memory, the low impedance when embedded in a Java application, and the increasing size of main memory – when does the problem become too big for a single instance? Put differently, what fraction of “Big Data” problems can be handled in-memory on a single box, using a very fast single-instance SQL database?

    The H2 database is *very* fast when embedded in a Java application, and operating in-memory. I believe you can also write your stored procedures (when needed) in Java.

    If we can partition the problem, we could fire up a herd of single-instances to operate on segments of the data. Using an SQL database we can easily do some fairly complex transformations. How does this compare in performance to non-SQL databases?

    If this approach works, there is no (or less) need to write custom engines.

    Expanding your question, more than answering. 🙂

    The JVM is also one of my concerns. I distrust Oracle. Can we port the JVM used by Google on Android?

    1. Google is using OpenJDK. There is only so much Oracle can do with OpenJDK.

      1. ShalokShalom says:

        Google reimplemented JVM as Dalvik, they are not using OpenJDK. Even more so: They implement it as Register based, as opposed to Stack-based, which is a huge effort on its own.

        You are also completely incorrect on the difficulty of concurrency since these challenges apply only on imperative code.

        Functional languages get concurrency without even need to think about, its one of the lots advantages of them.

        1. Google (…) are not using OpenJDK.

          “As an open-source platform, Android is built upon the collaboration of the open-source community,” a Google spokesperson told VentureBeat. “In our upcoming release of Android, we plan to move Android’s Java language libraries to an OpenJDK-based approach, creating a common code base for developers to build apps and services. Google has long worked with and contributed to the OpenJDK community, and we look forward to making even more contributions to the OpenJDK project in the future.”

          Source.

          Quoting Wikipedia:

          On Android Nougat, OpenJDK replaces the now-discontinued Apache Harmony as the Java libraries in the source code of the mobile operating system. Google has been in an ongoing legal dispute with Oracle over claims of copyright and patent infringement through its use of re-implementations of copyrighted Java APIs via Harmony. While also stating that this change was to create a more consistent platform between Java on Android and other platforms, the company admitted that the switch was motivated by the lawsuit, arguing that Oracle had authorized its use of the OpenJDK code by licensing it under the GPL.

          1. ShalokShalom says:

            Yeah, this is the library collection. Dalvik or now the ART (Android Runtime) itself is a new implementation.

            https://en.wikipedia.org/wiki/Dalvik_(software)#Architecture

            1. You are correct; Google, Microsoft, IBM and others have built their own VMs. There are many available, including free ones… OpenJ9, Excelsior… See this list on Wikipedia: https://en.wikipedia.org/wiki/List_of_Java_virtual_machines

              But Java itself is bound to its standard libraries. That’s what makes Java, Java. And that’s the part you cannot legally reproduce.

              So the larger point is whether one needs to trust Oracle to use Java.

              OpenJDK is the key.

              1. ShalokShalom says:

                Well, the original post asked to port JVM, not JRE and JDK.

        2. Functional languages get concurrency without even need to think about, its one of the lots advantages of them.

          If your world is stateless, then concurrency is a solved problem.

          1. ShalokShalom says:

            Correct. And functional languages are stateless per design.

            1. Databases are not stateless in general. When they are, then concurrency is not a problem. But that is true irrespective of your programming paradigm.

              1. ShalokShalom says:

                It gets much more easy with message passing, preemptive scheduling and supervisors, so Erlangs design. Also Akka.NET to a certain extent, etc. If you call that a programming paradigm..