17th July 2020, 2 min read

Downloading files faster by tweaking headers

I was given a puzzle recently. Someone was parsing JSON files downloaded from the network from a bioinformatics URI. One JSON library was twice as fast at the other one.

Unless you are on a private high-speed network, the time required to parse a file will always be small compared to the time required to download a file. Maybe people at Google have secret high-speed options, but most of us have to make do with speeds below 1 GB/s.

So how could it be?

One explanation might have to do with how the client (such as curl) and the web server negotiate the transmission format. Even if the actual data is JSON, what is transmitted is often in compressed form. Thankfully, you can tell you client to request some encoding. In my particular case, out of the all of the encodings I tried, gzip was much faster. The reason seems clear enough: when I requested gzip, I got 82 KB back, instead of 766 KB.

`curl -H 'Accept-Encoding: gzip' $URL`	0.5 s	82 KB
`curl -H 'Accept-Encoding: deflate' $URL`	1.0 s	766 KB
`curl -H 'Accept-Encoding: br' $URL`	1.0 s	766 KB
`curl -H 'Accept-Encoding: identity' $URL`	1.0 s	766 KB
`curl -H 'Accept-Encoding: compress' $URL`	1.0 s	766 KB
`curl -H 'Accept-Encoding: *' $URL`	1.0 s	766 KB

Sure enough, if you look at the downloaded file, it has 766 KB, but if you gzip it, you get back 82 KB.

What I find interesting is that my favorite tools (wget and curl) do not request gzip by default. At least in this instance, it would be much faster. The curl tool takes the --compressed flag to make life easier.

Of course, the point is moot if the data is already in compressed form on the server.