3rd March 2020, 12 min read

Will calling “free” or “delete” in C/C++ release the memory to the system?

15 thoughts on “Will calling “free” or “delete” in C/C++ release the memory to the system?”

Jeffrey W. Baker says:

March 3, 2020 at 8:52 pm

This seems … expected? People who care about this are going to override operator new (with tcmalloc or whatever suits their needs) People who don’t override evidently don’t care. People who want to know how much space it takes to allocate their data structure would do well by using nallocx.
1. Daniel Lemire says:
  
  March 3, 2020 at 8:54 pm
  
  This seems … expected?
  
  People who know about tcmalloc will find my blog post unsurprising, but they are not the market for this post.
Albert says:

March 3, 2020 at 9:39 pm

This behavior is not specific to tcmalloc: any heap allocator has liberty to pre-allocate; returning memory to the system after free is even more rare. So the answer to the question in the title is NO and an implication is that heap memory commonly gets over-committed to save on the number of system calls, as your analysis demonstrates.

Notably, malloc is not even a system call, brk is. So when a user process calls malloc there may be no expectatiosn whatsoever that the allocator turns around does the setbrk. Things can get weirder as the allocator may choose to mmap a page far away from the brk threshold instead. This common technique is commonly used when a large chunk of memory is requested.

IMO, your conclusion that

… there are ways to force the memory to be released to the
system, but you should not expect that it will do so by default.

is spot on. You are absolutely correct to assume that the memory usage cannot be calculated from the SIZEOF alone, even the page alignment is taken into consideration.
Wilco says:

March 3, 2020 at 10:27 pm

GLIBC has several tunables which allow you to decide how much memory is overallocated, at which size mmap will be used, and how quickly freed memory is returned to the system.

If every malloc/free required a system call, programs would run 1000x slower! GLIBC even checks whether the current process is single-threaded and bypasses atomic instructions if so. It is much faster to check this flag on each call than to just use atomics even if uncontended.
1. Daniel Lemire says:
  
  March 3, 2020 at 11:01 pm
  
  Thanks for the great link.
primepie says:

March 3, 2020 at 11:06 pm

I don’t think it makes sense to talk about this topic without discussing the malloc library being used. C++ has nothing to do with anything happening here. Same goes to the previous alloc related posts.
1. Daniel Lemire says:
  
  March 3, 2020 at 11:34 pm
  
  Of course, the specific results will depend on many different factors, but my point here is that you cannot be certain that the memory will be returned to the system. This is a general statement that I can make without specifying the details of my system.
  1. primepie says:
    
    March 4, 2020 at 1:31 am
    
    Agreed. Maybe highlighting how many C++ memory operations have nothing to do with C++ language per se but rather are highly influenced by the malloc library and OS features. This way the user learns directly what’s influenced by the language and what’s influenced by the environment.
    
    P.S Your compression/optimization posts + your papers are amazing!
    1. Daniel Lemire says:
      
      March 4, 2020 at 1:13 pm
      
      In my view, this is part of the C++ language, in the sense that the C++ specification does not require that memory be given back to the system. So if we ever have this expectation, we are making unwarranted inference.
      
      I’d go so far as to say that when teaching C++ programming one should explicitly state that “free” does not release the memory to the system necessarily and that new and malloc may claim much more memory from the system that the code suggests.
      
      This is similar to how people who learn Java should know about JIT compilation and garbage collection.
Nathan Myers says:

March 4, 2020 at 6:00 am

There are usually very sound reasons not to release memory back to the OS, paricularly in a multithreaded program. Each such release causes a “TLB shootdown”, in which threads on other cores are blocked while the cores’ “translation lookaside buffers”, caches of page mappings, are cleared, and further stalls as their entries are re-filled.

This is another reason to prefer single-threaded processes, which are less subject to such shootdowns, with less-coupled forms of parallelism.

Besides the TLB potholes, releasing memory means that the next time is requested, the OS is obliged to zero it before the process gets to see it again. Furthermore, each page will be marked read-only, causing a trap the first time it is touched, and then zeroed lazily.

As a result, freeing memory to the OS should only be essayed with the support of a great deal of measurement of the consequences.
jld says:

March 4, 2020 at 11:59 am

If the memory is not necessarily released why bother with the hassle (and the danger of dangling references) of an explicit free and not just use the Boehm-Demers-Weiser garbage collector?
I have been using it for more than 10 years with no trouble.
1. Daniel Lemire says:
  
  March 4, 2020 at 1:16 pm
  
  You are right that whether you use a garbage collector or not, you typically do not have a tight control on how much RAM your program is using.
PSS says:

March 5, 2020 at 3:57 am

What happens if you allocate, free, allocate and then freeing the memory again?
Would be interested the results for your test program that allocated/freed 30,000kb. If it does this twice, does it end up using 31,000kb or 61,000kb?
1. Daniel Lemire says:
  
  March 5, 2020 at 12:59 pm
  
  My program actually does precisely what you suggest. I run through a loop to make sure I get stable results.
2. Wilco says:
  
  March 5, 2020 at 11:50 pm
  
  Any freed memory is used by subsequent allocations. So the memory is reused within the same application. It’s just not aggressively returned to the system.
  
  It is possible for freed memory to become fragmented. For example allocate 101 blocks of 32 bytes, do some work, then free all except for one randomly chosen block. There are 3200 bytes of free memory which can be reused. However if you now try to allocate a single block of 3200 bytes, it won’t fit, so more memory is needed.
  
  Most programs only use a few different block sizes, making such fragmentation in long running processes rare.