Daniel Lemire's blog

, 24 min read

How fast can a BufferedReader read lines in Java?

35 thoughts on “How fast can a BufferedReader read lines in Java?”

  1. You might want to retry using an StringBuilder rather than a mutex locked StringBuffer. This should get you serious performance increase.

    1. I am not benchmarking anything having to do with a StringBuffer.

      1. James Abley says:

        BufferedReader internally uses a StringBuffer, which from my reading can be safely swapped out for a StringBuilder.

        I seem to get an improvement in performance from using a patched BufferedReader with that change.

        Code to be published later:

        Benchmark Mode Cnt Score Error Units
        MyBenchmark.stdLibBufferedReader thrpt 25 29.542 ± 0.599 ops/s
        MyBenchmark.patchedStdLibBufferedReader thrpt 25 33.426 ± 0.108 ops/s
        MyBenchmark.stringLines thrpt 25 87.141 ± 1.155 ops/s

        I’ll check the OpenJDK project to see whether that’s a reasonable change.

        1. Steve Davidson says:

          It’s an overdue change.

        2. Steve Davidson says:

          Let me know if assistance is needed — the Java NIO Package is using BufferedReader as well and the Mutex locks in StringBuffer is causing nasty and unnecessary performance hits.

        3. James Abley says:

          Published my code.

          Still waiting for the thing I reported on https://bugs.java.com/ to be reviewed and published.

  2. Jay Askren says:

    BufferedReader is probably the most common way of reading files but it is also the slowest as shown by Martin Thompson: https://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

    1. It is certainly not the slowest! Using Scanner over a raw file is far slower.

      1. Jay Askren says:

        That’s fair. Should have said one of the slowest.

  3. jens says:

    “the default buffer size is 8192 characters capacity. Line size is considered as 80 chars capacity.”
    form the javaDoc:
    http://www.docjar.com/html/api/java/io/BufferedReader.java.html

    you maybe resize it.

    1. Jay Askren says:

      Good catch. I bet that would speed it up.

      1. Travis Downs says:

        The ideal buffer size and line size aren’t really related. The buffer size is for reading large(ish) chunks of the file which are then parsed into lines. Selecting the buffer size is mostly a function of system call overhead versus the desire to keep stuff in L1. I have repeatedly found 8K to be a sweet spot, although that was pre-Meltdown/Spectre which might have pushed the ideal buffer size up.

    2. mt3o says:

      You should try resizing the buffer to different sizes and rerun the benchmark. There is no magic behind buffered reader, everything is synchronous and it’s filling with data once the buffer is emptied.

  4. James Abley says:

    Using String.lines in JDK 11 gives me 2 or 3 times better performance.

  5. degski says:

    What’s the point of this post? Java is not easier to write [than C or C++], probably harder, one needs [to have installed] a VM and it’s slow, so what’s to like?

    1. folderk says:

      What’s the point of this post? Java is not easier to write [than C or C++], probably harder, one needs [to have installed] a VM and it’s slow, so what’s to like?

      Java is much easier to write than C or C++ (for one, you don’t have to manually manage memory, and for second, it has much less corner points and nuances than C++), it’s plenty fast for most tasks (and on par with C/C++ on some), and installing a VM is a non issue.

      So you comment in wrong in each and every statement it makes…

      1. degski says:

        for one, you don’t have to manually manage memory, and for second, it has much less corner points and nuances than C++.

        Using RAII and smart pointers does away with manual memory management, forget C with classes, we moved on from there.

        Yes, it’s subtle and one needs to master it, it does not in itself mean it’s hard to write [and it got simpler to write fast code since C++1 and following std updates].

        I don’t need to ask my user to install the JVM, that seems like a major advantage.

      2. degski says:

        Forgot the most important bit, it is 2 times [with the optimizations in some of the other answers, 4 as posted] than plain C++. That’s the difference between google needing a mere 500’000 servers [to conduct its business] as compared to 1’000’000.

    2. Bob Foster says:

      What is the point of your comment? I get you don’t like Java, but it’s hardly relevant to the speed of I/O.

    3. Steve Davidson says:

      Folks like you have been providing about 1/2 the work that I get, so THANKS! Java WAS slow, in the 90’s and early 2000’s. JIT around 2000 & the Runtime Profiler’s and Optimizers introduced around 2005 made Java quite performant. And there are lots of “extras” that the VM is providing if you are doing anything that needs Database, XML, WebServices, or any other non-trivial application.

  6. onkobu says:

    Ever tried Files.readAllLines? enter link description here

  7. Jörn says:

    In contrast to the C getline function, BufferedReader.readLine and BufferedReader.lines does not include the newline character in the returned strings. It looks like you are building a huge one-lined string in scanFile, which would lead to repeated resizing of the read buffer later on.

    1. Jörn says:

      I played around with the code some more and the above suggestion does not really improve the performance as much as I thought. There is still too much copying of string contents happening.

      Using the indexOf/substring loop mentioned on hackernews gets the performance to about 2x the original, but substring is still creating copies. (This actually changed in java7, earlier it would create a view holding on to the original string contents which was deemed to be bad for memory consumption.)

      Using subsequence and changing parseLine to accept a CharSequence sounds like it should work, but behaves exactly the same as substring due to backwards compatibility, the subsequence method just delegates to substring.

      The one thing that gave a huge improvement was to implement a custom CharSequence implementation which does no copying and create that in an indexof loop. With that approach I finally got to about 2GB/s on this haswell laptop.

      So I completely agree with your point, java can be fast, but you’d have to know exactly what you’re doing. And often the standard library works against you.

      Modified code is available at https://gist.github.com/jhorstmann/9dcdc3c26a26e4ad6f513128942a47d9

      1. Thanks for sharing your code.

  8. Louis St-Amour says:

    Further comments on this post at https://news.ycombinator.com/item?id=20542023 include a potential optimization to the benchmark which might reduce the slowdown from 4x to just twice as slow.

  9. Nathan Kurz says:

    I’m Java illiterate, but there’s a comment on HN (https://news.ycombinator.com/item?id=20542438) that suggests you might not be measuring what you think you are measuring. Specifically, the author says that the call to lines() in your preprocessing step (L19) strips all the newlines, so that when you concatenate the results together with append() you are creating a single 23MB “line”. I’m not sure if it affects your conclusion, but given that your benchmarking is over a foreach loop, presumably this wasn’t your intent?

    It was also suggested (I think usefully) that a few more details about the test environment would be helpful to evaluate your result. While you mention it in the linked earlier post, it would probably help to say again which machine, which version of Java, which C++ compiler, and so forth so that the post is more standalone.

    1. you might not be measuring what you think you are measuring

      There was a typo in an earlier version of my code, but this was quickly corrected. It turns out not to affect the result… or, at least, not the conclusion.

      which version of Java, which C++ compiler, and so forth so that the
      post is more standalone.

      I’ll add more details but I think that this is somewhat nitpicking unless one can show that they consistently get 3 GB/s parsing text files in Java. That is, I provide an example that I view as ‘representative’ or ‘credible’.

    1. Thanks for sharing. Note that we have much faster disks today than in 2008.

  10. Craig Macdonald says:

    Have you tried?

    String s = null; while( (s = br.readLine()) != null) { parseLine(s); }

    I wonder how much overhead there in the streams.

    1. Yes… see the code repository.

      It seems that streams have some overhead, maybe… but it is small.

  11. Tagir Valeev says:

    Please note that BufferedReader.lines is smarter than getline: it supports any line delimiters: ‘\n’, ‘\r’ or ‘\r\n’ while getline supports only ‘\n’. Clearly having more tests per each character adds some overhead. Though I would pay this overhead, rather than having garbage result if the input file comes from Windows. As it was mentioned above, use String.split(‘\n’) if you specifically need ‘\n’.

    1. Tagir Valeev says:

      Oh, sorry, String.split(‘\n’) was not mentioned above, and probably it’s not the best solution as it would allocate all the strings at once.

  12. Ismael Juma says:

    FYI, a small improvement was done in OpenJDK as a result of this blog post:

    https://bugs.openjdk.java.net/browse/JDK-8229022

    Ismael

    1. Thanks for the pointer.