, 25 min read
Web server ‘hello world’ benchmark : Go vs Node.js vs Nim vs Bun
34 thoughts on “Web server ‘hello world’ benchmark : Go vs Node.js vs Nim vs Bun”
, 25 min read
34 thoughts on “Web server ‘hello world’ benchmark : Go vs Node.js vs Nim vs Bun”
I have already explained my point on what’s the expected performance for simple Vs complex cases for this, but I suggest as well to run something to be sure all the framework are using the available CPU resources. They often doesn’t come with decent ergonomics and default for it. Furthermore, If the load gen run on the same machine, isolated the cores of the twos trying hard to make the server the bottleneck, constraining its resources.
Not a fair comparison at all. Why did you a slow third party library for bun/nodejs? Seemed like you did not know what you were doing. Try HyperExpress or uWebsockets.js
I have updated the numbers with uWebsockets.js. See updated blog post.
Node fanboy detected. 🤡
Indeed, it is an interesting question. So why not check it? 🙂
This doesn’t look painful to me:
https://github.com/tokio-rs/axum/blob/main/examples/hello-world/src/main.rs
Those numbers are hiding something interesting. I never used
bombardier
.Could you clarify how much time the benchmark has run for? This would let us know at how much real connection/s each one as capped and if any reached the 10/10000 marks.
bombardier runs for 10 seconds by default.
Only Nim is using a third-party library built for speed—it’s only 777 lines of code, and it doesn’t even support HTTP2.
For shame!
httpbeast is not a 3rd party lib but by a Nim core team member. And btw. it’s not pimping up the speed but rather using Nim’s async ‘s quite speedy performance. Its task isn’t speed per se but rather to provide http related functionality.
I hear you. Just to capture some of the knowledge I’ve built up I wrote a server entirely in C, which I call “sloop” (server loop):
https://github.com/chkoreff/sloop/tree/main#readme
It does the necessary buffering so it can print the “Content-length” header. At some point I could change it to chunked transfer encoding.
The servers I actually use now are written in Fexl, so I can just call run_server with an arbitrary Fexl program to interact with clients, e.g.:
https://github.com/chkoreff/Fexl/blob/master/src/test/server.fxl
That Fexl code ultimately calls this “type_start_server” routine written in C:
https://github.com/chkoreff/Fexl/blob/master/src/type_run.c#L355
Interesting that the numbers are so low given that you use such a big server.
On an AMD Ryzen 9 5900HX i get the following numbers for go1.21.1 and nim 2.0.0
go 1.21.1
bombardier -c 10 http://localhost:3000/simple
Reqs/sec 246226.18
bombardier -c 1000 http://localhost:3000/simple
Reqs/sec 451854.02
nim 2.0.0
bombardier -c 10 http://localhost:3000/simple
Reqs/sec 440969.81
bombardier -c 1000 http://localhost:3000/simple
Reqs/sec 799546.14
I also testet this with fasthttp for go:
package main
import (
"io"
"log"
"github.com/valyala/fasthttp"
)
func main() {
h := requestHandler
if err := fasthttp.ListenAndServe(":3000", h); err != nil {
log.Fatalf("Error in ListenAndServe: %v", err)
}
}
func requestHandler(ctx *fasthttp.RequestCtx) {
io.WriteString(ctx, "Hello World")
}
bombardier -c 10 http://localhost:3000
Reqs/sec 351120.08
bombardier -c 1000 http://localhost:3000
Reqs/sec 601480.20
Your processor has a significantly higher clock speed, which could be a factor.
I also tried zig (i am not an zig expert).
Its based on the simple http example from https://github.com/zigzap/zap
example
const std = @import("std");
const zap = @import("zap");
fn on_request_minimal(r: zap.SimpleRequest) void {
r.sendBody("Hello World!") catch return;
}
pub fn main() !void {
var listener = zap.SimpleHttpListener.init(.{
.port = 3000,
.on_request = on_request_minimal,
.log = false,
.max_clients = 100000,
});
try listener.listen();
std.debug.print("Listening on 0.0.0.0:3000\n", .{});
// start worker threads
zap.start(.{
.threads = 16,
.workers = 16,
});
}
Results:
bombardier -c 10 http://localhost:300
Reqs/sec 312406.93
bombardier -c 1000 http://localhost:3000
Reqs/sec 470699.26
Note: ZAP is zig wrapper of facil.io (C Web Framework).
Zig only: https://github.com/karlseguin/http.zig
Both do not have TLS support on the server.
std.http.Server is unstable being redesigned with each change since 0.11.0, but std.http.Client has TLS support only.
As always brilliant content from Daniel
Did you ever try Seastar? (a c++ server framework)
I will be checking it out. Thanks.
Would be nice for all servers to return exactly the same response.
Indeed, but a few different bytes in the string should not impact performance.
In the mean time it is remarkable that high level language like Nim achieve such a performance and scale. Also impl in Nim seems very ideomatic.
Since you are using boost::context, it means that you are using the stackful coroutine instead of the stackless C++20.
If you want to reduce system dependency, you could opt for asio-standalone (without boost) which will allow you to couple it with co_* and awaitable (C++20 – stackless only).
You may find the following benchmarks interesting.
https://www.techempower.com/benchmarks/#section=data-r21&test=plaintext
Thanks. Note that I link to techempower early on in the blog post.
Please compare with Nginx. Just curious.
Can you illustrate how you’d build a small specialized web server similar to the examples above using Nginx? Suppose you have already your software, and you want to add a small HTTP server to it, how do you link to Nginx as a software library?
This might be of interest: https://openresty.org/en/
But, there is another which turns out to be in widespread use. I have lost the reference… used more in China.
Do you have code samples on how I can use openresty to embed a web server in my application?
If you have an Apache, IIS or Nginx server, you can build web applications on top of it but that is not what my blog post was about. My blog post was about building small web applications (in different programming languages) using existing software libraries.
I am considering the scenario where you have a program, when you launch the program, you launch a web server.
Just install nginx.
/etc/nginx/sites-enabled/default
with:http {
server {
listen 3000;
location /simple {
return 200 'Hello';
}
}
}
Save file.
sudo service nginx restart
bombardier -c 10 http://localhost:3000/simple
Let me restate what I wrote in the comment you are replying to:
Nim compiles to either C or LLVM IR right? I don’t see any details in the nimble code – what became of your code? Are each of these solutions building executables?
It’s impressive that Nim wins given that there isn’t serious optimization energy going into it yet, but I’m not clear on the validity of this benchmark yet.
You might also include actual web servers to compare, like nginx (written in C) and Microsoft’s Kestrel (maybe written in C++, possibly C#).
I’d switch to HTTP/2 or HTTP/3, since those are dominant now on the web. For example, lemire.me defaults to HTTP/3.
The use cases here are to add a web server to your application (whether it is written in JavaScript, C, C++, Nim), or to build a small specialized web server.
Would you share your code… e.g., how do I do the equivalent of my C++ application (see code in the blog post), say in C, using nginx as a library?
Or do you mean the reverse… You have a web server, and you integrate your code inside it (e.g., use CGI calls). That’s a whole other paradigm, and not really comparable.
So, I just read on the uWebSockets.js github page, that they are the default server for bun. So, I am curious as to why the node.js would be faster (maybe margin of error)?
My stack is supposed to wrap the HTTP service. See copious-world repositories. E.g. in copious-transitions, one is supposed to be able to create a subclass of the lib/general-express.js and then run without changing much else. Otherwise, I am using JSON messaging on micro-services, and those are set up for TLS, UDP, etc. So, “just working” is a goal that has gone through a few renditions, but optimization paths into other languages is an eventual goal.
So, one possibility is to work with components that are all about intrinsics. Perhaps the ones found at benchmarks (warning about brainf** and language). So, plugging in JSON and base64 libs might be good (maybe bun ffi is better than luajit). Also, Sha256 intrinsics are out there and blake3 is nice to have when not but more mute with them. You may see that V remains viable given they did work on MatMul.
Unfair benchmark: an obvious difference in the implementations is that only 2 of them return “Hello!” in the body while others each return their own variation.
If the implementers didn’t even ensure this simple comparison of the implementations, how serious is this work?
Can you explain your rationale? Why would you think that the performance would depend on the exact content of the string?