Daniel Lemire's blog

, 64 min read

“Hello world” is slower in C++ than in C (Linux)

82 thoughts on ““Hello world” is slower in C++ than in C (Linux)”

  1. zahir says:

    Also, using printf instead of std::cout does not seem to help C++.

    1. Matti Laa says:

      It does, if you remove #include header

  2. Jason Moore says:

    I am certainly not an expert in C++. However, if I remember correctly, std::endl is a lot slower than using \n. Of course, you may need to use std::endl. I wonder how the benchmark changes when using \n?

  3. Anonymous Coward says:

    This isn’t exactly news. The C++ specific printing facilities are known to be less efficient than plain old println(), and have been known to be slower decades.

  4. Jonas Minnberg says:

    Remove the std::endl and put the \n in the string like the C version, and it should go faster…

    1. mariusz says:

      Exactly.

      1. Alf Peder Steinbach says:

        Well that’s a false meme, associative thinking. `endl` just causes a call of `flush`. At some point before end of `main` the stream is flushed anyway, so, net win = one function call and check.

    2. Schrodinbug says:

      No…I think that’s fake news… I’ve heard a lot of people say that std::endl is a new line with a flush, but that either isn’t exactly true or at least implementation defined.

  5. John Keubler says:

    I try to stick with C and use macros if needed to enhance the language. There is a thing for sticking with simplicity. C++ is to complicated and bloated. OOP is Ok but I much better perfer functional programming using just functions.

    1. Mirko says:

      This code is a really bad comparison.

      This gives the idea that C++ is bloated and slower (it is not, actually it is faster in real code than C).

      And then you have people like this coding in the stone age justified with memes.

    2. gepronqx says:

      C is not functional at all. C is procedural.

    3. Eric Hopper says:

      I can show you C++ programs that run rings around their C counterparts.

      1. Sachin says:

        Please will you mail me some samples at [email protected]

    4. Francis Mossé says:

      When problems are large or complex, the OO C++ features simplify your code to a very large extent.

      Functions are fine, but associating them with the proper data is cumbersome in C, simple and scalable in C++, based on Classes, their extensions or generalization, their relationships, and their instances. Abstraction is the reason why C++ was created, and it delivers that, hence the power and simplicity of its code.

      Real “Functional Programming” isn’t supported by languages as basic as C. Consider exploring languages that are built for Functional Programming, they would give you more power in a world you already like.

  6. Alex Chen says:

    Isn’t this, to some extent, testing the streaming IO part of the STL in C++, instead of the language itself? For what it’s worth, std::cout and std::endl probably does more (like flushing the cache) than printf under the hood, which could potentially account for the 1ms increase in execution time.

    1. Chris says:

      It is a well established fact that C++ does not provide a zero overhead abstraction unfortunately.
      Note that many features of C++ in fact do provide (+-) zero overhead abstractions.

    2. Davide Cunial says:

      I think a fair comparison would be to do like so:

      int main() {
      std::ios_base::sync_wyth_stdio(false);
      std::cout << "hello world\n";
      return EXIT_SUCCESS;
      }

      Can you try the benchmark with this C++ implementation?

  7. Cassie says:

    I have a concern about your conclusion here — not that it’s necessarily wrong, but that this test is incomplete. Specifically, this test does nothing to differentiate between execution time and function call time.

    If we’re looking at 1 ms overhead every time you print to console, I’ll grant that’s significant. But if we’re looking at 1 ms per execution? I can’t rightfully agree with your conclusion that this is significant. Yes, granted, we’re talking about a 200% increase in the execution time for Hello World, but in 2022, I cannot think of a real-world situation where anyone would be executing hello-world equivalent software with such frequency that it creates a cpu bottleneck. Not even in the embedded space.

    I haven’t tested it yet (I might), but my guess is the performance difference you’re seeing takes place in loading the module, and if you were to print to console 10,000 or 100,000 times per execution, you’d still be looking at about a 1 ms difference per execution. I’m basing this guess on the fact that we’re seeing such a significant performance increase in the statically linked c++ version and the knowledge that in a Linux environment, there’s some decent chance that stdio.h is preloaded in memory while iostream is not.

    Obviously, my hunches are not data, and more testing is required before we draw any conclusions here.

    The other question I have is whether you’re running hyperfine with the -N flag. Without it, on processes this short, it’s kicking the following warning at me:

    Warning: Command took less than 5 ms to complete. Note that the results might be inaccurate because hyperfine can not calibrate the shell startup time much more precise than this limit. You can try to use the `-N`/`–shell=none` option to disable the shell completely.

    Which seems potentially relevant.

    I might be back later with followup results.

  8. Pa says:

    Endl is slower than “\n”, you should try it again to see if it makes any difference.

  9. Charles says:

    Try removing stdlib in both programs. Return 0 instead. Also use \n in the cpp program instead of endl. Would be interested in seeing the results of that

  10. Jakob Kenda says:

    There is a difference in your C++ code as opposed to C code, and that is the std::endl statement, which flushes stdout. There is no flushing in the C code. For the code to be equivalent, the C++ statement should be
    std::cout << "hello world\n";

  11. Oliver Schonrock says:

    Perhaps differences in lib and not lib loading.

    https://twitter.com/oschonrock/status/1557092072540307456

  12. Hebigami says:

    I’m not a professional C or C++ dev but I still remember a few basics from the time I studied physics at my local university (we had C/C++ lectures).

    Both endl and cout have side effects. You compare two pieces of code that don’t do the same thing. You should not expect them to run equally fast.

    There are ways to reduce the side effects like NOT using endl or using ios_base::sync_with_stdio(false).

    https://godbolt.org/ helps a lot if you want to know more details.

  13. Cassie says:

    I’ve done some followup testing. It appears that my concerns with the methodology wire unfounded, but I have since seen some other critique of your methodology that I have not explored.

    You can see my changes to your code and references to the additional critique on my github (https://github.com/cassieesposito/Code-used-on-Daniel-Lemire-s-blog-2022-08-09)

    1. John says:

      In your updated C++ code (multi_hello.cpp), you should also replace std::endl with “\n” as previously suggested here. I suspect this may have a much larger impact on the results due to flushing after each print for 30000 iterations.
      Interested in seeing updated results!

  14. Mark Rohrbacher says:

    Hello Mr. Lemire,

    IMHO, the comparison of those two snippets isn’t very fair, as the C++ code does a bit more than the C code:

    Streaming std::endl does not only stream a ‘\n’, it also flushes the stream (https://en.cppreference.com/w/cpp/io/manip/endl).

    To make the two programs more comparable, you should either replace the C++ streaming with
    std::cout << "hello world\n";
    or add a
    fflush(stdout);
    to the C program.

    In my tests, both hellocppstatic and hellocppfullstatic were faster than helloc, with both of these changes, hellocpp was slower. However, as my machine wasn't completely idle, these results may be inaccurate.

    But let's go a step ahead:
    If you omit the printf / flush / cout streaming, just leaving the "return EXIT_SUCCESS" (and the includes), the C++ program will most probably be slower. This is because of the static initialization of std::ios_base (std::ios_base::Init::Init() gets called on program startup as soon as gets included).
    It’d be interesting to see the results after removing this include, as the object code of the hello.c and hello.cpp should be totally equal.

    Best regards
    – Mark

    1. Matti Laa says:

      “This is because of the static initialization of std::ios_base (std::ios_base::Init::Init() gets called on program startup as soon as gets included)”

      This. Static initialization and destruction is made if iostream header is just included, and even not used. Using stdio.h instead of iostream and printf gives you exactly the same result of assembly between these two languages. Latest GCC release output:

      .LC0:
      .string “Hello world”
      main:
      sub rsp, 8
      mov edi, OFFSET FLAT:.LC0
      xor eax, eax
      call printf
      xor eax, eax
      add rsp, 8
      ret

      But yeah, overall I think this is a good example that all the features in C++ over C are not here for free. You have to understand how using your libraries, your code (of course;) and sometimes even how compilers work, if optimizing CPU usage is your priority one.

  15. Tim Parker says:

    This is such a beautiful example of measuring something, and yet understanding almost nothing about what they mean, I shall be using this as an example for our new starters on the pitfalls of premature optimisation and the importance of meaningful test structures and data.

  16. Dave says:

    This is based on biased info from decades ago.

    99.99999% of C++ programs used for professional applications in this world do not use standard out (or err) to convey runtime status.

    C++ apps are easier to develop than C, have more rich features, so I’m not sure what you are driving at.

    Oh, C++ apps are oftentimes deployed in embedded (or server) environments…. where there is definitely no I/O to a terminal.

    I suspect this article was written by a troll.

    1. The blog post is specifically about “hello world”.

      If you mean to refer to large programs, then I agree, but it happens often enough that we have to run small commands that only do a few microseconds of work.

      1. Tim Parker says:

        You’re not measuring what you think you are.

    2. Justin M. LaPre says:

      Did you try passing -fno-exceptions and -fno-rtti? That may impact your numbers as well.

  17. Evan Teran says:

    The C++ program is doing more work than the C program.

    You should avoid using `std::endl unless you specifically intend to flush the buffers explicitly. There’s nothing wrong with using a simple newline character.

    But also, IO streams are known to be measurably slower than printf. Especially since it has hidden global constructors and destructors.

    std::format is the new modern way to write formated strings.

    So, it’s not really that “hello world is slower in C++”, it’s that The methods that you’ve chosen to perform the task in C++ are by nature slower (But offer better type safety and internationalization capabilities).

    For the simple task of printing”hello world”, honestly you should just use puts.

    1. cdevaw says:

      GCC does remove printf() and inserts puts() https://gcc.godbolt.org/z/dcx4Tz4WK

      That is why it is so fast.

  18. John Halmaghi says:

    << std::endl inserts a newline AND flushes the stout buffer, which I don't believe printf() does.
    It would be interesting to see the comparison without << std::endl, since flushing the buffer is a relatively costly operation, it should give you a better apples to apples comparison. I'm no expert though.

  19. Hermas Mohamed says:

    give it a try without std::endl
    https://youtu.be/GMqQOEZYVJQ

  20. marc says:

    This is not an accurate, endl also includes a flush, which is no longer necessary in c++, and adds unnecessary time. You could have just as easily used “\n” in the c++ version the same way you did in the c version.

  21. yueshan says:

    cout do lots of things you should know

  22. Jeff Bailey says:

    iostreams are not a minor bit of infrastructure.

    If.you want to compare program startup time, use printf in the C++ version as well.

    You should be able to look at the assembly output to make a good comparison. That’s a better view of what’s happening and why

  23. Richard Cervinka says:

    There is no difference between std::endl ant ‘/n’ because std::cout is flushed at the end of the application.

  24. zahir says:

    IMHO it is all about linking with the libstdc++. In the first version of the code I did only replaced std::cout… line with printf line from the C version (without changing includes or linking directives) and the results for C++ did not change on my computer.

    I ran a perf record/report on that version and unlike C, at least 30% time was being lost on locale functionality. My guess is not linking to libstdc++ removes underlying C++ locale functionality from printf.

    Measurements were on my 10 year old machine.

    I wonder what will change if we link with/to clang/libc++ though.

  25. Hello Lemire,

    in C++, operations are synchronized to the standard C streams after each input/output.

    According to cppreference (https://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio), synch_with_stdio may reduce the penalty:


    If the synchronization is turned off, the C++ standard streams are allowed to buffer their I/O independently, which may be considerably faster in some cases.


    std::ios::sync_with_stdio(false);
    std::cout << "hello world" << std::endl;

  26. Ofek Shilon says:

    When used correctly (specifically, with ` `std::ios_base::sync_with_stdio(false);` , cout is in fact much faster than printf:
    https://stackoverflow.com/questions/31524568/cout-speed-when-synchronization-is-off

  27. John M says:

    I’m glad neither I, nor my children, attended the University of Quebec if this is how professors spend their time. You conclude that:

    “.. if these numbers are to be believed, there may a significant penalty due to textbook C++ code for tiny program executions, under Linux.”

    then, in a later comment response, state:

    “The blog post is specifically about “hello world”.”

    If it’s the latter, then the former conclusion is invalid. You cannot infer that tiny programs under Linux will perform slower, using C++ rather than C, on the basis of a one line example where the method used is different.

    There are multiple comments addressing the specifics of the differences, and reasons for them, but, if I were you, I’d take this blog post down as it makes you look foolish.

  28. Niclas says:

    I have a lot of respect for your work, so this blog post is quite baffling & sadning . -What exactly are you getting at or aiming for ?

    ”there may a significant penalty due to textbook C++ code for tiny program, under Linux.”

    -BS, & you’re comparing apples to oranges . readup what cout actually does. Is your printf thread safe? (you can turn of sync_with_io for the std streams if you want that monster to be faster). std::printf is also maybe worth mentioning

  29. Mirko says:

    This code doesn’t show C++ being slower than C.

    Rather, this is “iostream with stdio sync on printing two strings” being slower than “printf for the trivial case of a string”. No news here.

  30. Drue says:

    Everyone else has already mentioned how flawed this is.

    But a better test would be to compare two computationally intensive algorithms or generics, written properly in each language.

  31. These days you should use “`std::fmt (https://en.cppreference.com/w/cpp/utility/format/format).
    It’s optimization and compile time logic should beat printf.

  32. wqw says:

    Students using C++ streams in programming contests are hammered to prolog main with
    [code]
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);
    [/code]
    . . . in order to achieve printf/scanf performance.

    https://stackoverflow.com/questions/31162367/significance-of-ios-basesync-with-stdiofalse-cin-tienull

    1. Hebigami says:

      Thanks wqw! I was aware of sync_with_stdio but i’ve never seen tie before.

      It’s always a pleasure to learn something I could use someday 🙂

  33. tetsuoii says:

    What you have discovered is just the tip of the craptastic bloatberg that is every other language not C.

  34. Jack Mazierski says:

    If all you write is hello world then all you need is C.

    Only that we are not in 1992. C is quite useless for user mode apps nowadays and no one creates console apps except Linux freaks that have nothing else to write.

  35. This is a micro-benchmark that illustrates a simple point. I do not believe Daniel is going after any massive generalizations.

    Oh. And all the comments about flushing the I/O buffer … a moment of thought should have told you the examples were equivalent. While it have been a couple decades since I dug into runtime libraries, pretty sure every runtime must flush buffers on program exit.

    Put differently…
    Did you see the output?
    Then the runtime library flushed output buffers on exit.

    Yes, loading dynamic libraries is more expensive. Often this does not matter, but sometimes it can be significant. There is or should be a savings in memory used (across multiple programs using the same libraries), and this can sometimes be significant.

    The savings from shared dynamic libraries was critical in the Win16 era, and for some time after. In present many-gigabyte machines, rather less so. (In this century, have tended to use static libraries more often than dynamic.)

    The C printf() and stdio library was honed decades ago on much leaner machines, and (as you might expect) is lean and efficient. If you dig back into the USENET archives, you can find a period (late-1980s / early 90s?) where there was a bit of a public competition to see who could come up with the leanest stdio library. That code ended up in compiler runtime libraries, and I strongly suspect survives to the present (and offers examples of hyper-optimization).

    The C++ standard streams library arrived on fatter machines, and never received such attention (in part as you can use C stdio).

    Daniel’s experiment matches well with history.

    1. Tim Parker says:

      “This is a micro-benchmark that illustrates a simple point. I do not believe Daniel is going after any massive generalizations.”

      With respect, the claim was made in the article that “.. if these numbers are to be believed, there may a significant penalty due to textbook C++ code for tiny program executions, under Linux.”
      Disregarding the strict meaning of ‘may’ – which would make the whole statement a semantic null – this is (IMO) quite a massive over generalization. It is a micro-benchmark, and a poorly considered and written at that, and there is effectively no meaningful generalization at all possible from it – as has been pointed out by many in these comments. That the author subsequently states that this was specifically about “hello world” is not properly reflected in the main text, even now.

      It also seems to expose a deep lack of knowledge not only of the what the programs are doing, what the objects and functions are designed – and their benefits and deficits – but also of C++.
      I’ve seen renderers and whole micro-kernels constexpr’d – which is harder to do in C – and could result in enormous performance benefits, but that’s not the point, nor is it necessarily a reason to choose one language over the other. They were particular implementations, for particular purposes, and but do demonstrate aspects of a language that could be useful in many situations, but which should not be over generalized from. This is the most egregious issue for me, the apparent attempt to classify language suitability on a frankly meaningless code snippet which is hardly an example of any useful real-world program – this is something we try to stamp out from even the newest of starters, and from a professor of computer science seems quite ridiculous to me. YMMV obviously.

      1. Stephen Tran says:

        Can you do a test to show that for small programs similar to “Hello World” but not necessarily the same C++ runs as fast as C if not faster? This would settle the issue. Wouldn’t it?

        1. Tim Parker says:

          At an extreme, you could try something like this
          https://onecompiler.com/cpp/3wdmzd9js
          (or trying Googling for ‘constexptr fibonacci’)
          Re-working that as C should give an indication of what can be done, but – like the article – it’s really missing the point, and I could probably equally well make a C++ version that is far worse **.

          One of the main reasons that individual, micro-benchmarks like this aren’t useful for answering questions like “is language X faster than language Y ?” is that the question is completely meaningless.

          What we can do is ask, for my particular problem space *and* my typical data sets / operating conditions – what would a good choice of language and strategy be ? If, for example, you were designing an ultra-high speed / low latency peer-to-peer message passing system, you probably wouldn’t choose Python. However if you wanted to implement a simple peer-to-peer client-server application then Python, with it’s interpreted nature and rich library support, would make such a thing relatively trivial. It’s exactly these sort of evaluations that should be driven into programmers, and first year computer science students in Quebec and elsewhere, from day one.

          Using massively simplified, and atypical, noddy code fragments -especially when naively implemented – is not really helpful or instructive, and mainly serves to teach people bad coding practice and poor performance analysis techniques IMO.

          ** These are poor Fibonacci number generators, so don’t use them in any performance sensitive regime, they just an example 🙂

    2. rhpvorderman says:

      The C printf() and stdio library was honed decades ago on much leaner machines,

      It is still being honed. Memchr for instance uses sse2 instructions on x86-64 machines. These instructions were available only long after both c and c++ gained widespread adoption. Memchr beats std::find
      https://gms.tf/stdfind-and-memchr-optimizations.html

      Glibc is much more optimized than libstdc++ simply because it is much smaller,and therefore developers can devote more time to optimization.

      The truth is that abstractions come at a cost of complexity and size which makes it harder to optimize. “Zero-cost abstractions” may be true in a few cases, but there will always be cases that are too hard or time-consuming to look into. It is a simple matter of tradeoffs.

  36. Catron says:

    You didn’t provide what compiler you used. I assume it was gcc. In gcc “printf” is one of the built-in functions. This means there is no library involved at all (neither dynamic nor static). It’s practically part of the language and the #include is just for syntax reasons.
    I didn’t read the internals but I assume that gcc doesn’t call a classical printf at all put optimises it on the compiler level, e.g. do the formating at compile-time and use the ‘write’ syscall directly.

    1. I use a straight Ubuntu 22 and the Makefile is provided (see links), so yes: gcc.

      You make a good point regarding printf.

  37. Eric Hopper says:

    This program will run faster than any C program written using the standard qsort library function:

    #include
    #include
    #include
    #include

    using namespace ::std;

    constexpr int ipow(int base, int exp)
    {
    if (exp == 0) {
    return 1;
    } else if (exp < 0) {
    return 0;
    } else if (exp % 2) {
    return base * ipow(base, exp – 1);
    } else {
    return ipow(base * base, exp / 2);
    }
    }

    int main()
    {
    vector foo(ipow(2,30));
    random_device rd; //Will be used to obtain a seed for the random number engine
    mt19937 gen(rd()); //Standard mersenne_twister_engine seeded with rd()
    uniform_int_distribution dis;
    generate(foo.begin(), foo.end(),
    [&gen, &dis]() {
    return dis(gen);
    });
    cerr << "Sorting.\n";
    sort(foo.begin(), foo.end());
    return is_sorted(foo.begin(), foo.end()) ? 0 : 1;
    }

    1. It is true that C++ has many advantages over C as far as algorithmic implementation goes.

      Your program allocates gigabytes of memory and sorts it. If you reduce the task to sorting 12 numbers, the answer might be different, and that’s the motivation of my blog post.

      1. Eric Hopper says:

        It isn’t that the implementation of sort is better in C++, it’s that you can’t reasonably make a version of the qsort function in C that runs faster than sort in C++.

        And this is because the sort function in C++ is a template, and the compiler essentially writes you a custom one for the data structure and comparison function you’re using in which the comparison and swap functions are inlined into sort and then subjected to aggressive optimizations.

        Making this happen in C would require macro magic of the highest order, and even then would probably be a huge pain to use correctly.

        Your “Hello world” case reads like a general criticism of C++, when I strongly suspect that C++ is faster than C in most cases because of things like I just mentioned. So, it seems like a criticism that’s narrowly tailored to make a point that I don’t think is particularly accurate.

        1. Eric, you are interpreting my post to say something I do not say (C++ is slow).

          I have written and co-written several high performance projects in C++. E.g., please see https://github.com/simdjson/simdjson

  38. Sam Mason says:

    After an extended play with this I would say that the library/flushing issues that most commenters aren’t anything to worry about. All the significant difference in timings seem to be due to the dynamic linker.

    C code that dynamically links to libc takes ~240µs, which goes down to ~150µs when statically linked. A fully dynamic C++ build takes ~800µs, while a fully static C++ build is only ~190µs. Across all of these, the different between printing one “hello world” vs 1000 is only ~20µs.

    Getting good timings was the hardest thing here! Code/analysis are in:

    https://github.com/smason/lemire-hello

  39. Eric Hopper says:

    Sorry to post more than one comment. I would go back and edit my original if I could….

    The reason that C++ is taking longer here is that the runtime environment of C++ is more complex. Not a LOT more complex, but, it is more complex. C++ has global constructors and destructors that need to be executed on program startup and shutdown. Additionally, the compiler needs to track which global destructors need to be called because (in the case, for example, of local ‘static’ variables in side functions) which ones need to be called can only be determined at runtime. This requires a global data structure that’s initialized on program startup and scanned on program shutdown.

    Additionally, there will be some overhead required to set up the exception handler of last resort.

    I have a hello world written in C++ that will execute faster than C, but it requires passing lots of compiler options to turn off the compiler’s setup of the C++ runtime environment. It would be possible to duplicate this program in C, but it would be challenging, especially with the quality of error handling it’s possible to achieve using my library:

    My library: https://osdn.net/projects/posixpp/scm/hg/posixpp
    (Github mirror): https://osdn.net/projects/posixpp/scm/hg/posixpp

    Link to hello world program written using my library: https://osdn.net/projects/posixpp/scm/hg/posixpp/blobs/tip/examples/helloworld.cpp

  40. Scott says:

    I guess my response would be “duh”. With C you have very little object code and a single call to the printf function in a statically loaded library. With C++ you have substantial startup overhead loading the iostream library and all the modules it depends on. Address allocation takes time and it’s going to prepare for all the possible dynamic libraries you might load as well.

  41. Menotyou says:

    Comments are Better than the article IMO

  42. Schrodinbug says:

    Iostreams are known to have performance issues so this isn’t earth shattering news. I’m glad you mention std::format. libfmt which that evolved from has a function called fmt::print…I guarantee fmt::print(“hello world\n”); will not just be as fast as printf, but faster. Especially if there’s a lot of formatting to be done. This is because it can do some of the formatting work at compile time. And it’s typesafe, so no having to worry and remember the gazillion printf variations. It’s freaking amazing. The print function didn’t make it to c++20, but I believe it’s being pushed for in c++23.

  43. Cal Gray says:

    The title should be “C++ streams are slower than printf” which is a known fact as streams favor versatility over performance. Streams are significantly different to print functions since formatting is stored as state within the stream object and takes time to construct and destruct once for the program lifetime.

  44. sl2 says:

    How about a test where you simply output the exact same code compiled in c and c++ …. this will account for the apple to orange comparison…

    also store the time before and after in μSec or a GetTickCount() before and after the call….. and output this data at the end….. this will account for startup lib / runtime difference….

  45. Amin Yahyaabadi says:

    Instead, I suggest you to use libfmt instead. It is safer and faster. Also, note that the newer C++ standard has replaced iostreams with better alternatives. If you are micro-optimizing, you should consider these details.

  46. Roman Avtukhoff says:

    Because used std::

  47. Marcos says:

    It would be more accurate if you printed several thousand lines in a loop. The execution time of printing a single line could easily be confused with loading and startup time. Then, there is also the flushing. You want to make sure that you are flushing the same number of times.
    Lastly, given the object oriented nature of C++, it would make sense to turn on optimisations.

    1. The programs are compiled with optimization (-O2).

      1. Tim Parker says:

        As long as you’re not trying to measure the performance the languages can offer under GCC, that might be adequate. If you’re wanting to try to replicate what typical release production code would do, then that’s probably not (partially depending on the functionality being used).

  48. Jimmy Ellison says:

    You could throw in some dirty inline assembly lines involving kernel syscall to improve performance.

  49. Raffaello Bertini says:

    Doing a benchmark of such small time can be tricky, even only for various caches.

    It is not a good test neither, as 1ms difference (in 1 run?) for doing no processing at all with data, has no significance nor a value to use language A rather than B.

    No one in the world could be interested in investing to save 1ms for a program that is doing nothing. Because it resolves no problems (but actually it creates one 😂)

    If you want a real comparison of some small routines (this one used for the test it isn’t, but it could be used ) and if you can’t profile them, the way to go is to look at the generated assembler code.

    Than on that can be done an analysis, test, benchmark and writing some conclusion.

    Beside focusing on a real problem, it could hp better do this kind of comparison c/c++

  50. Raffaello Bertini says:

    One small improvement on this test could be to take out the “net weight” computing the “tare”:

    Run both programs with empty main

    Then hello world and do the net time execution of the hello world.

    Here the tricky part is the include statement should be present or not?

    Beside the expectation should be that the 2 empty programs take the same time to run,
    Otherwise it implies that hello world itself isn’t faster or slower, but there is some bootstrap overhead.

    Anyway.
    I Enjoyed the post.
    Thanks

  51. Jonathan says:

    Incorrect. This isn’t valid performance analysis. In fact, using GCC 11.2 with -Ofast, std::cout is 1.1x faster. At -O3, they are the same.

    https://quick-bench.com/q/lGltfiZ439DZGuBm1yc_GEy2TYQ

  52. Tim Parker says:

    As the post as been significantly re-written, the emphasis of the slow-down altered, and a number of the criticisms folded into the text as original text, it might be nice to address more of that in replies/updates to the comments and/or acknowledged in the new article text as appropriate.
    This has been done is a couple of cases, but not all – and this puts the revised text at odds with the historical comments.

    The issue of relevance and suitability of the micro-benchmark as-is is not really dealt with either (e.g. if the absolute time was important you would profile, adjust, iterate – if it’s not important, it’s not important), but that’s another matter.

    1. Thanks. Unfortunately people repeatedly proposed alternative explanations, without running benchmarks themselves nor accounting for the critical point that my post makes: the speed dramatically increased after statically linking. I have added a paragraph to acknowledge these additions as you suggested. I do thank the various readers for their proposals but I am not going to answer point-by-point dozens of closely related comments.

      Regarding the relevance of the benchmark, I have explained and re-explained it at length. For long running processes, the issue has always been irrelevant, but if you have short running processes (executing in about a millisecond or less), then you may be spending most of your time loading the standard library. You may not care, of course… but it is useful to be aware. There are solutions such as static linking, but there are tradeoffs.

  53. What this shows (and really all this shows) is that the C++ library is a lot bigger than the C library. If you are not using the capability it provides, it is costing you performance.

    On the other hand, if you have significant work on a complex problem, you can have better performance because the library and the language provides facilities that would be difficult and expensive to write in C.