Daniel Lemire's blog

, 25 min read

Web server ‘hello world’ benchmark : Go vs Node.js vs Nim vs Bun

34 thoughts on “Web server ‘hello world’ benchmark : Go vs Node.js vs Nim vs Bun”

  1. forked_franz says:

    I have already explained my point on what’s the expected performance for simple Vs complex cases for this, but I suggest as well to run something to be sure all the framework are using the available CPU resources. They often doesn’t come with decent ergonomics and default for it. Furthermore, If the load gen run on the same machine, isolated the cores of the twos trying hard to make the server the bottleneck, constraining its resources.

  2. Alex says:

    Not a fair comparison at all. Why did you a slow third party library for bun/nodejs? Seemed like you did not know what you were doing. Try HyperExpress or uWebsockets.js

    1. I have updated the numbers with uWebsockets.js. See updated blog post.

    2. J.C. says:

      Node fanboy detected. 🤡

  3. J says:

    An interesting question is whether one can get better performance by using servers written entirely in a language like C, C++, Rust or Zig. I tried building the equivalent in C++, but it was so painful that I eventually gave up.

    Indeed, it is an interesting question. So why not check it? 🙂

    This doesn’t look painful to me:

    https://github.com/tokio-rs/axum/blob/main/examples/hello-world/src/main.rs

  4. Sandros94 says:

    Those numbers are hiding something interesting. I never used bombardier.
    Could you clarify how much time the benchmark has run for? This would let us know at how much real connection/s each one as capped and if any reached the 10/10000 marks.

    1. bombardier runs for 10 seconds by default.

  5. Calling Out The BS says:

    Only Nim is using a third-party library built for speed—it’s only 777 lines of code, and it doesn’t even support HTTP2.

    For shame!

    1. Calling out the BS caller says:

      httpbeast is not a 3rd party lib but by a Nim core team member. And btw. it’s not pimping up the speed but rather using Nim’s async ‘s quite speedy performance. Its task isn’t speed per se but rather to provide http related functionality.

  6. Patrick says:

    I tried building the equivalent in C++, but it was so painful that I eventually gave up.

    I hear you. Just to capture some of the knowledge I’ve built up I wrote a server entirely in C, which I call “sloop” (server loop):

    https://github.com/chkoreff/sloop/tree/main#readme

    It does the necessary buffering so it can print the “Content-length” header. At some point I could change it to chunked transfer encoding.

    The servers I actually use now are written in Fexl, so I can just call run_server with an arbitrary Fexl program to interact with clients, e.g.:

    https://github.com/chkoreff/Fexl/blob/master/src/test/server.fxl

    That Fexl code ultimately calls this “type_start_server” routine written in C:

    https://github.com/chkoreff/Fexl/blob/master/src/type_run.c#L355

  7. Rene Kaufmann says:

    Interesting that the numbers are so low given that you use such a big server.

    On an AMD Ryzen 9 5900HX i get the following numbers for go1.21.1 and nim 2.0.0

    go 1.21.1

    bombardier -c 10 http://localhost:3000/simple
    Reqs/sec 246226.18
    bombardier -c 1000 http://localhost:3000/simple
    Reqs/sec 451854.02

    nim 2.0.0

    bombardier -c 10 http://localhost:3000/simple
    Reqs/sec 440969.81
    bombardier -c 1000 http://localhost:3000/simple
    Reqs/sec 799546.14

    I also testet this with fasthttp for go:

    package main

    import (
    "io"
    "log"

    "github.com/valyala/fasthttp"
    )

    func main() {

    h := requestHandler
    if err := fasthttp.ListenAndServe(":3000", h); err != nil {
    log.Fatalf("Error in ListenAndServe: %v", err)
    }
    }

    func requestHandler(ctx *fasthttp.RequestCtx) {
    io.WriteString(ctx, "Hello World")
    }

    bombardier -c 10 http://localhost:3000
    Reqs/sec 351120.08
    bombardier -c 1000 http://localhost:3000
    Reqs/sec 601480.20

    1. Your processor has a significantly higher clock speed, which could be a factor.

  8. Rene Kaufmann says:

    I also tried zig (i am not an zig expert).

    Its based on the simple http example from https://github.com/zigzap/zap

    example

    const std = @import("std");
    const zap = @import("zap");

    fn on_request_minimal(r: zap.SimpleRequest) void {
    r.sendBody("Hello World!") catch return;
    }

    pub fn main() !void {
    var listener = zap.SimpleHttpListener.init(.{
    .port = 3000,
    .on_request = on_request_minimal,
    .log = false,
    .max_clients = 100000,
    });
    try listener.listen();

    std.debug.print("Listening on 0.0.0.0:3000\n", .{});

    // start worker threads
    zap.start(.{
    .threads = 16,
    .workers = 16,
    });
    }

    Results:

    bombardier -c 10 http://localhost:300
    Reqs/sec 312406.93

    bombardier -c 1000 http://localhost:3000
    Reqs/sec 470699.26

    1. Note: ZAP is zig wrapper of facil.io (C Web Framework).

      Zig only: https://github.com/karlseguin/http.zig

      Both do not have TLS support on the server.

      std.http.Server is unstable being redesigned with each change since 0.11.0, but std.http.Client has TLS support only.

  9. Sean says:

    As always brilliant content from Daniel

  10. Niek says:

    I tried building the equivalent in C++, but it was so painful that I eventually gave up

    Did you ever try Seastar? (a c++ server framework)

    1. I will be checking it out. Thanks.

  11. Alexander Yastrebov says:

    Would be nice for all servers to return exactly the same response.

    1. Indeed, but a few different bytes in the string should not impact performance.

  12. ANIL CHALIL says:

    In the mean time it is remarkable that high level language like Nim achieve such a performance and scale. Also impl in Nim seems very ideomatic.

  13. The C++ solution, I initially encountered many difficulties. Using
    Lithium turned out to be simple: the most difficult part is to ensure
    that you have installed OpenSSL and Boost on your system

    Since you are using boost::context, it means that you are using the stackful coroutine instead of the stackless C++20.

    If you want to reduce system dependency, you could opt for asio-standalone (without boost) which will allow you to couple it with co_* and awaitable (C++20 – stackless only).

  14. Darryl Pye says:

    You may find the following benchmarks interesting.

    https://www.techempower.com/benchmarks/#section=data-r21&test=plaintext

    1. Thanks. Note that I link to techempower early on in the blog post.

  15. Cihad says:

    Please compare with Nginx. Just curious.

    1. Can you illustrate how you’d build a small specialized web server similar to the examples above using Nginx? Suppose you have already your software, and you want to add a small HTTP server to it, how do you link to Nginx as a software library?

      1. This might be of interest: https://openresty.org/en/
        But, there is another which turns out to be in widespread use. I have lost the reference… used more in China.

        1. Do you have code samples on how I can use openresty to embed a web server in my application?

          If you have an Apache, IIS or Nginx server, you can build web applications on top of it but that is not what my blog post was about. My blog post was about building small web applications (in different programming languages) using existing software libraries.

          I am considering the scenario where you have a program, when you launch the program, you launch a web server.

      2. cihad says:

        Just install nginx. /etc/nginx/sites-enabled/default with:

        http {
        server {
        listen 3000;
        location /simple {
        return 200 'Hello';
        }
        }
        }

        Save file.

        sudo service nginx restart
        bombardier -c 10 http://localhost:3000/simple

        1. Let me restate what I wrote in the comment you are replying to:

          If you have an Apache, IIS or Nginx server, you can build web
          applications on top of it but that is not what my blog post was about.

  16. Joe Duarte says:

    Nim compiles to either C or LLVM IR right? I don’t see any details in the nimble code – what became of your code? Are each of these solutions building executables?

    It’s impressive that Nim wins given that there isn’t serious optimization energy going into it yet, but I’m not clear on the validity of this benchmark yet.

    You might also include actual web servers to compare, like nginx (written in C) and Microsoft’s Kestrel (maybe written in C++, possibly C#).

    I’d switch to HTTP/2 or HTTP/3, since those are dominant now on the web. For example, lemire.me defaults to HTTP/3.

    1. The use cases here are to add a web server to your application (whether it is written in JavaScript, C, C++, Nim), or to build a small specialized web server.

      Would you share your code… e.g., how do I do the equivalent of my C++ application (see code in the blog post), say in C, using nginx as a library?

      Or do you mean the reverse… You have a web server, and you integrate your code inside it (e.g., use CGI calls). That’s a whole other paradigm, and not really comparable.

  17. So, I just read on the uWebSockets.js github page, that they are the default server for bun. So, I am curious as to why the node.js would be faster (maybe margin of error)?

    My stack is supposed to wrap the HTTP service. See copious-world repositories. E.g. in copious-transitions, one is supposed to be able to create a subclass of the lib/general-express.js and then run without changing much else. Otherwise, I am using JSON messaging on micro-services, and those are set up for TLS, UDP, etc. So, “just working” is a goal that has gone through a few renditions, but optimization paths into other languages is an eventual goal.

    So, one possibility is to work with components that are all about intrinsics. Perhaps the ones found at benchmarks (warning about brainf** and language). So, plugging in JSON and base64 libs might be good (maybe bun ffi is better than luajit). Also, Sha256 intrinsics are out there and blake3 is nice to have when not but more mute with them. You may see that V remains viable given they did work on MatMul.

  18. O.M. says:

    Unfair benchmark: an obvious difference in the implementations is that only 2 of them return “Hello!” in the body while others each return their own variation.

    If the implementers didn’t even ensure this simple comparison of the implementations, how serious is this work?

    1. Can you explain your rationale? Why would you think that the performance would depend on the exact content of the string?