Daniel Lemire's blog

, 2 min read

The memory usage of STL containers can be surprising

C++ remains one of the most popular languages today. One of the benefits of C++ is the built-in STL containers offering the standard data structures like vector, list, map, set. They are clean, well tested and well documented. If all you do is program in C++ all day, you might take STL for granted, but more recent languages like Go or JavaScript do not have anything close to STL built-in. The fact that every C++ compiler comes with a decent set of STL containers is just very convenient.

STL containers have a few downsides. Getting the very best performance out of them can be tricky in part because they introduce so much abstraction.

Another potential problem with them is their memory usage. This is a “silent” problem that will only affect you some of the time, but when it does, it may come as a complete surprise.

I was giving a talk once to a group of developers, and we were joking about the memory usage of modern data structures. I was telling the audience that using close to 32 bits per 32-bit value stored in a container was pretty good sometimes. The organizer joked that one should not be surprised to use 32 bytes per 32-bit integer in Java. Actually, I don’t think the organizer was joking… he was being serious… but the audience thought he was joking.

I wrote a blog post showing that each “Integer” in Java stored in an array uses 20 bytes and that each entry in an Integer-Integer map could use 80 bytes. Java is sometimes ridiculous in its memory usage. C++ is better, thankfully. But it is still not nearly as economical as you might expect.

I wrote a small test. Results will vary depending on your compiler, standard library, the size of the container, and so forth… You should run your own tests… Still, here are some numbers I got on my Mac:

Storage cost in bytes per 32-bit entry |

STL container |Storage | std::vector |4 | std::deque |8 | std::list |24 | std::set |32 | std::unordered_set |36 |

(My Linux box gives slightly different numbers but the conclusion is the same.)

So there is no surprise regarding std::vector. It uses 4 bytes to store each 4 byte elements. It is very efficient.

However, both std::set and std::unordered_set use nearly an order of magnitude more memory than would be strictly necessary.

The problem with the level of abstraction offered by C++ is that you can be completly unaware of how much memory you are using.