3rd August 2021, 13 min read

How fast can you pipe a large file to a C++ program?

19 thoughts on “How fast can you pipe a large file to a C++ program?”

Dave says:

August 3, 2021 at 5:49 pm

Interestingly, if I switch from using std::cin to using fread(3) on stdin, I get speeds closer to 2.6 GB/s on my Intel MacBook Pro running Catalina. Using std::cin is extremely slow. Using read(2) instead of read is a tad faster.
1. Daniel Lemire says:
  
  August 3, 2021 at 6:25 pm
  
  Verified. I have updated the blog post.
  1. Arseny Kapoulkine says:
    
    August 3, 2021 at 8:59 pm
    
    I would also recommend using write to maximize write throughput in case that’s the new bottleneck (the overhead of iostream varies per platform but is almost always observably bad…)
Mike Hurwitz says:

August 3, 2021 at 6:52 pm

I ran your tests and was able to average ~3GBps using cpispeed, though only 0.02GBps using pipespeed. The previous poster’s comment seems appropriate.

I threw together a quick test in Go (my language of choice) to see what kind of throughput I could get. With 4MB buffers I was seeing ~3.9GBps without cleaning my environment at all (Chrome running, e tc.).

Just for fun, I also put pv between the emitter and collectors in both your tests and mine. I chose pv because it’s a very common C-based tool that handles pipes. I saw a measurable but fairly slight drop in both benchmarks with pv in the middle. I guess that shows that pv is using one of the more efficient APIs rather than std::cin.
1. Daniel Lemire says:
  
  August 3, 2021 at 6:56 pm
  
  I love your ‘quick test in Go’.
Andrew Johnston says:

August 3, 2021 at 11:46 pm

yes. using system api is much faster! i did some experiments a while ago with javascript and you can achieve these same speeds too: https://just.billywhizz.io/blog/on-javascript-performance-02/. the problem here is a lot of the time is being taken up by syscalls and the context switching into the kernel.

i think it would be possible to go (much) faster if we could do something entirely in userspace with, for example, io_uring on linux? https://unixism.net/loti/
Attractive Chaos says:

August 4, 2021 at 3:44 am

Have you tried to apply “std::ios::sync_with_stdio(false);”? See https://stackoverflow.com/a/9026594/
1. Daniel Lemire says:
  
  August 4, 2021 at 3:41 pm
  
  I do, please see source code in GitHub.
Graham King says:

August 4, 2021 at 4:48 am

Interesting question, thanks!

I made a Rust version. I get about 5 or 6 GB/s on Linux (Fedora 34 on a Thinkpad T15). I can get over 7 GB/s piping straight into pv though, so my reader must be the bottleneck.

https://gist.github.com/grahamking/a1bd00581fd15908338ee65f7937cbf1
me says:

August 4, 2021 at 8:14 am

“Plumbing” sounds so much like waste.

But pipes were even used to send messages such as orders in a factory with quite some success: https://en.wikipedia.org/wiki/Pneumatic_tube and these would commonly be placed vertical.
Alex says:

August 4, 2021 at 9:19 am

I don’t have mac with dev tools at hand to verify, but some versions of C++ standard library generate very inefficient code in debug.
I wonder if you will get better results by adding -O in there.
Florian Lemaitre says:

August 4, 2021 at 11:36 am

At some point, I was using pipes to transfer a raw video stream from raspividyuv to my program, but the pipe throughput was too low to be processed in realtime.
So I tried replacing the pipe with a UNIX socket (replace pipe with socketpair) and the speedup was impressive: from 200 MB/s to 700 MB/s on a raspberry pi 3.

Apart from the code creating the “pipe” nothing was changed, and in particular, the reading and writing code were exactly the same.

This made me wondering: why a socket is faster than a pipe?
Antoine says:

August 4, 2021 at 5:10 pm

This is probably the C++ IO APIs showing their inadequacy. Even using Python you can probably achieve more than that (sorry, I don’t have a reproducer to submit :-)).
Element14 says:

August 4, 2021 at 7:25 pm

20 years ago when I was still in high school I dabbled in competitive programming a bit. Back when g++ was still version 3.x, it was a common pitfall to use #include for anything that involved heavy IO. Programs would literally run out of time just reading input.

It seems that in some implementations of iostream the issue is still here. At any rate there’s too much “magic” in C++ standard library that using fread (or better yet just the posix read()) would give much more accurate results if one is trying to measure the performance of OS pipes.
Julian says:

August 5, 2021 at 12:05 pm

By the way, since you’re comparing with read(2) already, I notice that using vmsplice(2) on Linux immediately triples my results.
Dominic Amann says:

August 9, 2021 at 4:52 pm

I would be curious how the Windows pipe would compare. boost::system includes a simple pipe implementation that is cross platform for Windows and Linux.
Dominic Amann says:

August 9, 2021 at 4:53 pm

I would be curious how the Windows pipe would compare. boost::process includes a simple pipe implementation that is cross platform for Windows and Linux.
Dmitry Ganyushin says:

August 11, 2021 at 5:47 pm

You probably should not send data like this in production anyway. Maybe the right way to send the data is to use some special libraries that allow you to stream data from one application to another. Maybe this library:
https://adios2.readthedocs.io/en/latest/engines/engines.html#sst-sustainable-staging-transport
Ilya Popov says:

August 18, 2021 at 6:35 pm

This is a libc++ issue. On Ubuntu 21.04, when I compile with GCC 10.3, I get about 2.7-3.0 GB/s for both variants (cin and read). When I compile with Clang++ 11 using libstdc++, I get similar numbers. But when I compile with clang++ -stdlib=libc++ I get those 0.1GB/s vs 2.5 GB/s numbers. So the problem is QoI of libc++.