15th September 2016, 17 min read

The memory usage of STL containers can be surprising

20 thoughts on “The memory usage of STL containers can be surprising”

Marcus Ritt says:

September 15, 2016 at 4:33 pm

There’s also std::forward_list, which has pointer less and thus consumes less, as expected (16b on my Linux box).
Preston L. Bannister says:

September 15, 2016 at 4:52 pm

It looks as though you are counting memory allocated without the overhead of heap structures … which could possibly add up to quite a lot.
1. Daniel Lemire says:
  
  September 15, 2016 at 5:18 pm
  
  Yes, the numbers I offer are “optimistic”. I am assuming that the overhead is a constant quantity that becomes irrelevant if you have enough data.
  1. Gabriel Gouvine says:
    
    September 15, 2016 at 8:32 pm
    
    In practice, you can expect 16B overhead per heap allocation (64b glibc malloc, but most do the same AFAIK). There is a single allocation for the vector/deque, but one per element for the others, so not quite negligible.
    
    At first I was puzzled with your results for unordered_set and deque, but it makes sense.
    unordered_set uses singly-linked buckets and stores the hash: already 24B/node + a table of pointers. deque (like vector) overallocates: depending on the size you will measure different overheads but 2x isn’t surprising.
  2. Preston L. Bannister says:
    
    September 15, 2016 at 9:19 pm
    
    Depends on whether the STD code does lots of small allocations, and the heap implementation. My guess is there at least one word of overhead, per allocation.
    1. Daniel Lemire says:
      
      September 15, 2016 at 9:39 pm
      
      That’s a good point. I am not sure how to improve my benchmark to take this into account. Got any ideas? I can track carefully all allocations, I just don’t know how to measure their *actual* cost. Any standard way to do that?
  3. Preston L. Bannister says:
    
    September 15, 2016 at 9:55 pm
    
    Sorry, that was an incomplete statement. You want to compute the number of allocations, and thus the average size of an allocation. The overhead is usually fixed per allocation, usually at least one (32-bit or 64-bit) word. Depends on the heap manager code. (I have not looked at the GNU library heap code in a few decades. Your runtime library may be different.)
    
    Note also that the general-purpose heap manager has a compute cost on allocation and de-allocation. When the pattern of usage is not random, you can use allocation pools (few/large allocations from the general-purpose heap) and free-lists to make huge reductions in memory and CPU usage.
    
    I only use C++ when I need bare-metal performance, so these optimizations are usually relevant.
    
    If you do a lot of allocation/de-allocation of large numbers of small items, the cost of the heap can a problem.
    
    The general purpose heap code has to be thread-safe, which adds an additional cost. Application code can often be made thread-safe at a higher level, with lower cost.
    
    You theme in this post is exactly relevant. I use C++ when I need bare-metal performance. STL is a general purpose library, that puts generality ahead of performance. Frankly, writing exactly-focused array/list/hash classes is pretty easy. So … why would I use STL?
    
    Of course, some parts of even highly performant applications do not need to be especially efficient. Perhaps STL is a good fit there.
    
    If I had deep knowledge of the STL implementation, I would know when the STL code was as efficient as the code I might write. That knowledge would require an investment of time. If I can write an write an equivalent class in the same time, hard to justify the time to study (and periodically re-studying) the STL.
    
    If I were not very good at understanding and implementing algorithms, using the STL written by (hopefully) better programmers would be an excellent idea. I would have a very hard time justifying that case, as I believe we are long past the time when average programmers should be using C++.
    
    Which leaves the STL in an odd corner.
    1. Daniel Lemire says:
      
      September 16, 2016 at 3:39 pm
      
      It is true in any language that most of the code you write does not need to be particularly fast or memory conscious. The bottlenecks are often very specific.
      
      In C++, at least, you can rewrite the problematic code so it is fast and memory conscious.
davetweed says:

September 15, 2016 at 9:28 pm

Particularly with respect to the unordered_set, there’s the question of whether it’s “abstraction” or “uniformity” that’s the problem. From my (admittedly poor) understanding, the C++ specification of an unordered_set and its behaviour with respect to object storage pretty much requires a hash table with separate chaining in the buckets for an unknown general class. In that case the memory usage is maybe double what you might hope to achieve with really careful optimization. As you say, for a type like an int32_t you can use other representations to use much less memory and still get “as if” behaviour with respect to the guarantees stated in the standard. At which point it becomes a question: should the STL include such specialisations of the STL containers, or would that be better done in a different library (boost, or something different yet again)?

I don’t know the answer to that question, but I do know (from working at companies who have “reimplemented the STL to be more efficient”) that something that violates the guarantees in the standard discussed all over the place (stackoverflow answers, etc) in some corner cases is NOT a nice programming environment.
1. Daniel Lemire says:
  
  September 15, 2016 at 9:38 pm
  
  My instinct would not be to reimplement STL. Rather, I would create new, specialized data structures with their own documentation and API.
Dominic Amann says:

September 16, 2016 at 3:25 pm

Of course, the biggest “problem” I see with stl containers is that people either use the wrong one for their job (and performance requirements is part of that), or they fail to use them (and implement their own).

The problem in the real (non academic) world with implementing one’s own “super efficient” version, is that people in the private sector switch jobs frequently, and it is often the case that one has to take over code written by someone else. For all its flaws (which are few in my view), the stl is well documented and understood. Joe Blows’ super-set class – less so. The limitations of Joe Blows’ super-set class will have long been forgotten until they manifest as bugs when someone uses the class in a manner which only Joe knew would not be supported.

We have been using the boost libraries for some time, and we find they offer true (cross platform) portability, and substantially reduce bug counts, while maintaining or improving performance. I haven’t noticed memory utilization go up over our legacy code – but then our legacy code had many leaks and corner case that never worked properly.
1. Daniel Lemire says:
  
  September 16, 2016 at 3:45 pm
  
  The problem in the real (non academic) world with implementing one’s own â€œsuper efficientâ€ version, is that people in the private sector switch jobs frequently, and it is often the case that one has to take over code written by someone else. For all its flaws (which are few in my view), the stl is well documented and understood. Joe Blows’ super-set class â€“ less so. The limitations of Joe Blows’ super-set class will have long been forgotten until they manifest as bugs when someone uses the class in a manner which only Joe knew would not be supported.
  
  You may have noticed that my blog post was not critical at all of STL. I share your feeling. STL is well known, well tested, well documented. *Except* that when it comes to performance-sensitive tasks, STL can be tricky.
  
  I’d like to respond to your criticism of “academic work”. I don’t think it is particularly common for academics to write their own data structure code. That’s probably a lot more common in industry.
  
  Academic software is slow and broken, but that’s not because the authors shy away from things like STL.
Greg Popovitch says:

September 17, 2016 at 12:43 am

Indeed, the memory usage of std::unordered_set can be a serious issue for big data problems. In that case, consider using the sparse_hash_set from sparsepp, which is faster than std::unordered_set, but can fit 2 to 3 times more integers in the same memory. https://github.com/greg7mdp/sparsepp
Bender says:

September 17, 2016 at 11:41 am

Hi,
article is good but it misses some important stuff that would make it much better:
Overhead is constant per element, so if you had sets of shorts it would look even worse for structures with pointers. If you used
struct Point that has 2 doubles overhead would be smaller.

Also vector has 0 overhead only if you know its size in advance and you do reserve. Naive push_back (that is sometimes needed when you do no know how many elements you will get) can cause a lot of waste. Resize factor for gcc is 2 so in theory 49.99% of memory can be wasted.
https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md
1. Daniel Lemire says:
  
  September 19, 2016 at 2:10 pm
  
  I should point out that this was *measured* memory usage (minus the object overhead).
  
  Regarding std::vector, one can use shrink_to_fit (something I did not do here).
Lucas Clemente Vella says:

February 5, 2019 at 7:20 pm

I just want to point that memory usage for std::set is not linear to its size, and the overhear might be smaller for larger datasets.
1. Daniel Lemire says:
  
  February 5, 2019 at 7:31 pm
  
  I use 1024 elements, for smaller sets, it is indeed likely that the memory usage per element could be greater.
  
  I do expect the memory usage to be linear, for large enough sets (I am guessing that 1024 is “large enough”).
Frank Seidl says:

April 16, 2020 at 4:20 pm

What are the costs per entry when entries are larger? I imagine that the additional memory from a linked list comes from pointers, which stay small as the size of the entries increases. If this is the case, std::list should become more efficient when the entries themselves are large. Meanwhile a hash table might allocate enough space to store 2-4 times as many entries as it actually contains. So I would expect std::unordered_set to keep a ratio similar to 36:4 even for large entries.
1. Daniel Lemire says:
  
  April 16, 2020 at 4:28 pm
  
  I agree that a linked list probably becomes more efficient as the values get large. Note, however, that if your values are large, it becomes inefficient to copy them so you want to be using memory pointers.
alexpanter says:

June 22, 2023 at 11:39 am

Hi,
dunno if this thread is still alive – very interesting read, in particular the comment section!

It also reminded me that video games often have very particular memory requirements, and prefer injecting their own allocators wherever they can. For this reason they rarely use STL in AAA games afaik, but rather create their own STL-library, e.g. EASTL.

Andrei Alexandrescu made an interesting talk about std::allocator in 2015, which fits into this narrative. But will that require us to rewrite STL at some point? I don’t know, maybe someone else does.