Daniel Lemire's blog

, 10 min read

Appending to an std::string character-by-character: how does the capacity grow?

10 thoughts on “Appending to an std::string character-by-character: how does the capacity grow?”

  1. Oleksandr says:

    In MSVC, the capacity growths as 1.5x (unlike 2x in GCC or LLVM). This is a feature of the C++ Standard Library, not a compiler itself. Just curious if such an exponential growth policy is part of the standard or just a widely-accepted convention.

    You can read more in my article on the memory layout of std::string: https://tastycode.dev/blog/tasty-cpp-memory-layout-of-std-string

    1. Arseny Kapoulkine says:

      I believe the standard doesn’t guarantee any particular behavior; methods of string generally don’t seem to specify complexity, even for push_back: https://cplusplus.com/reference/string/string/push_back/ (“generally amortized constant” suggests exponential growth but “generally” also says it’s not mandatory :D)

      FWIW conventional wisdom also says 1.5x is better than 2x because 2x growth allows no opportunity to reuse memory during reallocation, whereas 1.5x allows the peak memory consumption to be lower as subsequent reallocations can take the space of earlier smaller sizes under the right set of circumstances.

      1. Todd Lehman says:

        I seem to remember reading somewhere once that any exponential growth less than the golden ratio (1.6180339…) has good reallocation properties. Does that sound right?

        1. Yeah I believe that’s correct. In practice 1.5x is great because not only does it avoid the risk of small over-allocation due to allocator padding going right over the limit implied by the golden ratio, but it also makes the size adjustments cheap to compute (which is a little relevant for small reallocations with a fast allocator).

          Maybe a final note is that one thing that most/all standard STL implementations get “wrong” is that they use the naive exponential formula that results in a very slow start: eg with 1.5x you get 1 => 2 => 3 => 4 => 6 => 9 => 13. This is not a big problem for strings as all implementations now use a short string optimization so the capacity starts from 16-24 and grows from there, but for std::vector (or maybe strings in languages that aren’t C++ and don’t use short string optimization like Rust?) you’ll probably want to change the curve in the beginning to be more aggressive, for example start with a small number of elements to fill a cacheline or so, or use 2x growth factor until 100-odd elements and then switch to 1.5x. This problem also goes away if the user code carefully reserves the container space, which would be a little helpful in the benchmark in this article as well.

      2. jerch says:

        FWIW conventional wisdom also says 1.5x is better than 2x because 2x growth allows no opportunity to reuse memory during reallocation

        Isn’t that only true for very specific reallocation circumstances like the used allocator model, whether the reallocation can avoid the content move (then a higher factor is slightly better needing less reallocs) or if the allocator allows partial reusage of still held segments during realloc-moves? The latter is kinda mandatory to get to the “Golden Ratio” rule, without it you’ll always run short on older coalesced segments.

        Maybe the fact that some compilers stick with 2 here indicates, that the std allocators are not that well prepared to benefit from that.

        1. allocator allows partial reusage of still held segments during realloc-moves

          Note that C++ STL never uses realloc – all reallocations by all containers are using the new..delete interface, so they explicitly allocate a new segment, explicitly copy data into it and then free the old one. (which is unfortunate in certain circumstances of course)

          The latter is kinda mandatory to get to the “Golden Ratio” rule, without it you’ll always run short on older coalesced segments.

          I don’t think that’s true. With a naive first-fit allocation model and an allocator that immediately coalesces blocks with neighbors on free, with 1.5X growth you get the requests of progressively larger size but the sum of freed blocks grows faster than the new request, so pretty quickly a new request starts to fit into the gap in the beginning. With 1.7X growth this no longer happens, so you always have less memory in front of your current block than your next block is, so the blocks only ever move forward in memory.

          Here’s a simple Python program that simulates this:

          # container implementation details: we assume that the growth is exponential
          # and that short string buffer handles strings of smaller size
          ratio = 1.5
          block = 16

          # allocator implementation details: we assume that we're using first fit allocator
          # that magically stores block metadata externally and has no alignment restrictions
          # for simplicity we just keep track of a single live allocation
          allocoffset = 0
          allocsize = 0

          for i in range(20):
          block = int(block * ratio)
          if allocoffset >= block:
          print(f"round {i}: allocated range [{allocoffset}..{allocoffset+allocsize}), allocating {block} from the beginning")
          allocoffset = 0
          allocsize = block
          else:
          print(f"round {i}: allocated range [{allocoffset}..{allocoffset+allocsize}), allocating {block} by advancing range")
          allocoffset += allocsize
          allocsize = block

          If you run it with ratio 1.7 you’ll see that the new allocation can never fit into the space left by freeing the previous allocations, whereas with ratios smaller than golden ratio such as 1.6 and 1.5 we periodically are able to fit the allocation into the beginning – this reduces peak memory consumption in a series of allocations. Of course the practical implications will depend on the details of allocator behavior and may not match the simple model.

          1. The same source posted in Gist to preserve indentation: https://gist.github.com/zeux/44dd37660e426d4c28a8f004a25c1605

  2. Woongbin says:

    I think something interesting to cover would be what the worst case looks like in those cases as well (is it nanoseconds, microseconds, or more?). In some applications like video games, I can imagine having a bad worst case performance could be problematic.

  3. Patrick says:

    It’s interesting to read this because I independently arrived at a “50%” technique in my own buffer module here:

    https://github.com/chkoreff/Fexl/blob/master/src/buf.c#L33

    Only difference is that I double the size up to a point, and thereafter increase it by 50%.

    I suppose I could use 50% every time, but I preferred going from 64 bytes to 128 bytes on the first growth, instead of 96 bytes. So I kept the doubling up to a point, which I set at 2^20 bytes.

  4. Nathan Myers says:

    What would often be optimal, and is not usually done, would be to allocate (a^n – b), for some base (a) maybe 1.5, (n) increasing incrementally, and (b) a property of the allocator related to the size of the header the allocator reserves for housekeeping.