A Survey on Memory Management Approaches

Tracing, Reference Counting, Substructural Types, Regions...

Nov 20, 2023

How to manage memory? The simplest possible solution (from the perspective of a language designer) would be to simply do nothing and leave it to programmers to explicitly acquire and release memory pages from the operating system. That’s a bit too cumbersome for most day to day programming, so even the lowest-level languages like C provide functions like malloc and free to manage memory on the heap.

A function like malloc may appear simple at first glance but most modern malloc implementations, like jemalloc or tcmalloc, are actually incredibly sophisticated. From avoiding memory fragmentation to scalable concurrency, there’s a lot of hardcore engineering work going on under the hood.

But while malloc implementations automatically manage memory pages, they’re considered a form of “manual memory management” because it is still up to the programmer to use malloc and free correctly. Managing memory manually, with unstructured usage of malloc and free, is highly error prone, with double-free and use-after-free bugs possibly resulting in severe security vulnerabilities.

Less severe, but also problematic, are memory leaks resulting from forgetting to free some allocations. A long running program with recurring memory leaks will keep increasing its memory usage until the process has to be terminated.

All three issues stem from “free”. If you forget to call it, you get a leak. If you call it more than once, you get a double-free. If you call it while some part of the program is still using that memory, you get a use-after-free. Ouch 😬.

This post is a survey on the various approaches that exist to avoid these issues, from semi-automatic solutions that greatly reduce the memory management effort to fully-automatic solutions that remove the need for “freeing memory” entirely.

I’ll cover the following in this blog post:

Regions (aka Arenas, aka Bump Allocators)
Memory Pools
Reference Counting
Tracing Garbage Collection
Substructural Type Systems
Lifetime Annotations

Regions (aka Arenas, aka Bump Allocators)

I mentioned previously that modern malloc and free implementations are highly sophisticated, so it may come as a surprise to you that one of the best ways to manage memory in C is to avoid malloc and use the simplest allocator imaginable instead: a bump allocator (also known as a region, or arena).

A bump allocator is called that because it just “bumps” a pointer forward on each allocation. The simplest bump allocator does little more than the following:

void* alloc(bump_allocator* a, size_t size) {
  if (a->pointer + size >= a->end) {
    return null;
  }
  void* result = a->pointer;
  a->pointer += size;
  return result;
}

Instead of returning null, the bump allocator may also request additional pages from the operating system, switching the pointer over to the new location.

If the bump allocator is only tracking a single pointer, how does the program free individual allocations? Well, the secret is that it doesn’t. You can only reset the allocator’s pointer back to the beginning1.

The standard malloc has to be as general as possible to cover all imaginable use cases. The bump allocator on the other hand can only be used to allocate blocks of memory that share a lifetime. Thankfully having many blocks of memory sharing a lifetime is rather common.

Consider a web server: each request can use its own bump allocator, allocating freely, followed by a single free when the request is complete. Or a video game: any data that only lasts for a frame can be allocated freely and then freed in bulk.

While a bump allocator cannot handle all memory allocation patterns, the patterns it can handle become almost trivial and therefore substantially less error prone. Going from 1 free per allocation to 1 free per frame makes a huge difference! It’s a lot easier to get that single free right. As bonus, bump allocation is also ridiculously efficient, so you get a massive performance improvement as the cherry on top.

The Zig and Odin programming languages are designed with the usage of bump allocators for temporary allocations in mind.

Having said that, while double-frees are extremely unlikely with bump allocation, use-after-free can still happen. It’s possible to defend against use after-free by combining regions with the various approaches below. For example, C3 uses scoped regions for temporary allocations.

Despite their simplicity, going in depth about regions would require a blog post on its own, as each combination with the other approaches creates a unique approach to memory management.

Memory Pools

While a bump allocator manages many allocations of different sizes all sharing the same lifetime, a pool allocator manages many allocations of the same size but with different lifetimes. The basic idea of a pool allocator is an array that tracks if the data at a particular index has changed.

A pool allocator does not return a raw pointer to each allocation, but instead it returns a handle with a mix of a generation ID and the index into the array where the allocation resides:

struct PoolHandle {
  uint32_t generation;
  uint32_t index;
}

The number of bits dedicated to the generation or the index can be changed as needed. The generation ID is used to track when a particular index in the pool is freed and/or reused. If the generation ID of a handle and the generation ID of the current allocation at that index don’t match, you have detected a use-after-free. Double-free and leaks are not possible with pools, beyond the “leak” of keeping a certain number of slots in the pool always available even if they are not in use. You can theoretically still have use-after-free and double-free bugs with the pool pointer itself, but since it is only 1 pointer for many allocations, the likelihood of error is greatly reduced.

Pools are somewhat inconvenient to use because to access their data you need both a pointer to the pool and the respective handle, you cannot access the data without having access to the pool. You also need different pools for different datatypes or allocation sizes at least. Much like bump allocators, they cover certain common memory usage patterns, but not all.

Note that unlike bump allocation which is semi-automatic (1 free for many allocations) pool allocation is manual, data is explicitly allocated and freed. The advantage comes from the added safety.

The Vale language extends the idea of generational handles to generational references, safeguarding every memory allocation with the same idea.

The two approaches above are handled by the programmer and are language independent: you can use bump allocation in any language with raw pointers, and pool allocation in every language with mutable arrays. The remaining approaches require the compiler to get involved.

Scope-Based Resource Management (aka RAII)

One of the main improvements C++ brought to the table over C was scope-based resource management, typically called by the awful name RAII2 (Resource Acquisition is Initialization), as that was the name originally given to the approach by Bjarne Stroustrup, the creator of C++. He’s apologized for the name ever since3.

The core idea of scope-based resource management comes from the realization that nobody ever complains about managing memory on the stack, only on the heap. Variables on the stack are automatically pushed and popped alongside the scopes in which they reside, so everything just works.

So why not tether heap memory to the stack variable that “owns”4 it, allocating it when the variable is initialized, and freeing it when the variable’s scope ends? Well, that’s exactly how it works in C++:

void example() {
  string s("Hello World");
  vector<string> v;
  // v's destructor is implicitly called here
  // s's destructor is implicitly called here
}

In the case of containers, like vectors, memory is freed recursively: first every element of the vector, then the vector itself. In C++ this is handled by giving every type a constructor member function and a destructor member function (sometimes even automatically generated by the compiler), and implicitly inserting calls to the type’s destructor at the end of the scope.

This approach of tying resources to stack variables is useful for more than just memory, so many languages provide scoped lifetimes (e.g. with in python) or a defer statement to execute some cleanup code at the end of the scope or function, even if they use a different approach to memory management.

If this was all that C++ supported, it would be a memory safe language (not counting features it inherited from C of course), but things are a bit more complicated.

One major issue with this approach is the pervasive copying. Just as normal stack allocation copies data freely, so must this emulation of the stack with heap memory. Copying small integers on the stack is no big deal, but copying large arrays is:

void example() {
  vector v1{...}; // some huge vector
  vector v2 = v1; // copy :(
}

To work around this issue, C++ has references and move semantics. References are just like pointers except they can’t be null (plus some other confusing and mostly irrelevant constraints). Move semantics use a special kind of reference (an rvalue reference) to indicate that the previous location of the value won’t be used anymore:

void example() {
  vector v1{...};
  vector v2 = move(v1); // no copy :)
  // v1 is in an undefined state after the move
  std::cout << v1.at(0) << "\n"; // *boom*
}

The move function doesn’t actually do anything, it is just a cast to an rvalue (&&) reference, the actual “move” occurs in the vector’s move assignment operator:

vector<T>& operator=(vector<T>&& other)

The operator is responsible for “consuming” the data from “other” (v1 above) and storing it in the vector (v2 above). After the assignment, “other” (meaning v1) is left in a safe but “unspecified” state, and should not be used anymore.

C++ does not provide any safety mechanisms for references and only minimal safety for moved-from values. You can’t get a double-free (because the compiler inserts the destructor for you, though the destructor could be buggy) but you can easily have use-after-free bugs with references.

Rust took this idea (including references and moves), and made it 100% safe. We’ll cover the approaches Rust uses to ensure safety later. For now, the important thing to keep in mind is that Scope-Based Resource Management (without references on top) enforces a tree-shaped structure on memory.

C++, Rust, and Ada all use Scope-Based Resource Management.

Reference Counting

The idea of reference counting is simple and effective. When you allocate some data on the heap, a “reference count” is stored alongside it. When a new pointer to that data is created, the reference count is increased. When a pointer’s lifetime ends (e.g. it was stored in a variable that’s now out of scope), the reference count is decreased. When the reference count reaches 0, the data gets freed.

Reference Counting is the simplest form of garbage collection and combines very well with Scope-Based Resource Management, since the latter can handle the increments and decrements of the reference count in the constructors and destructors of the pointers. Because of this languages like C++ and Rust can provide reference counted pointers as a library (std::shared_ptr in C++ and Rc and Arc in Rust).

There are a few issues with reference counting:

The first is that there are a lot of increment and decrement operations happening, and they need to hit main memory to update the reference count, which negatively affects cache usage. This can be mitigated by detecting “moves” which are an increment immediately followed by a decrement, and just skipping them.
The second is that in a multithreaded environment, these increment and decrement operations need to be atomic, which further slows things down. This can be mitigated by detecting when reference counted pointers cross thread boundaries.
The last is that reference counting cannot handle cycles. If A stores a reference to B and B stores a reference to A, the reference count of both is 1 and they’ll never get freed, causing a leak. This must be worked around by either forbidding cycles (constraint on the memory layout), using weak references to break cycles (manual labor and error prone), or applying some form of cycle collection (i.e. tracing).

Python, Swift and Nim all use reference counting. Python has cycle collection, Swift does not, and Nim’s is optional. Without a workaround for the cycle issue, reference counting imposes a DAG-like structure on memory.

Tracing Garbage Collection

Of the approaches we’ve seen, one requires that all allocations share a lifetime, one is still manual but protects against use-after-free and double-free, one enforces a tree-like memory layout, and one enforces a DAG-like memory layout.

What if we don’t want to worry about memory management at all5? That’s where tracing garbage collection comes in. Unlike every other approach covered so far, tracing garbage collection can handle arbitrary graphs of references.

The concept is simple: First, determine all root pointers, i.e., pointers stored in registers, stack variables and global variables that point to the heap. Mark every allocation pointed to by a root pointer. For every pointer stored in a marked allocation, recursively mark those allocations. Once every reachable allocation is marked, free all others as they are unreachable and therefore garbage data.

There are different ways to implement the above idea. The most direct is a mark-and-sweep garbage collector which marks and then sweeps (frees) exactly as described above. But another way to achieve the same idea is with a copying semi-space collector. The basic idea is as follows:

Set up two bump allocators, the “from” space and the “to” space. Allocate everything in the from space until it fills up. Once full, go through all the pointers as in a mark-and-sweep collection, but also copy each marked allocation to the “to” space. Once finished, “sweep” by simply resetting the “from” allocator pointer. The “to” space becomes the new “from” space and the old “from” space becomes the new “to” space.

Neat stuff. So if we have these algorithms that can handle arbitrary graph-like structures, why did I even bother going through the other more limited approaches above? Everything has tradeoffs:

The first issue is that it is mostly unpredictable when a tracing garbage collector will get triggered. The algorithm is not cheap so you don’t want to run it often.
The second, is that to make matters worse, when the garbage collector is triggered, it needs to stop the whole program while it does its job. This is terrible for applications that need to remain highly responsive like video games.
The third issue is that to work around the above issues, modern tracing garbage collectors are extremely complex, and any solution to reduce the length of the pauses inevitably requires doing more work, and therefore lowers throughput.
The fourth and final issue, is that there is a lot of infrastructure that is necessary for the tracing garbage collector to work (i.e., how does it know about all the pointers in the program? how does it stop every thread in the program?), I go over that in my article The Hidden Costs of Tracing Garbage Collection.

Java, Javascript, C#, Go, Haskell, etc. all use tracing garbage collection, it is by far the most popular approach despite the drawbacks.

Substructural Type Systems

I mentioned before that I’d cover the approaches used by Rust to ensure complete safety for scope-based resource management with moves and references. First lets have a look at “moving”.

“Moving” in C++ happens when a move constructor or move assignment operator is executed, it is otherwise not tracked by the type system. The moved-from data must be set by the move operation to some safe but unspecified value such that the destructor can later check for that value to prevent a double-free.

Rust tracks ownership of values at the type system level with affine types, a kind of substructural type system. A variable of an affine type must be used at most once.

fn example() {
  let x = vec![...];
  let y = x; // move
  println!("{}", x); // compile error, x was moved
  // destructor of y implicitly called here
}

The “x” variable was “used up” in the second line of the function, and as such cannot be used anymore. It’s not so much that rust tracks “moves” but more so “uses”. If a variable is never used, rust will implicitly insert a destructor for it at the end of the scope, but from the type system’s perspective the destructor does not matter.

As an alternative to affine types, there are also linear types that enforce each variable is used exactly once. In a linear type system the programmer would be expected to explicitly call a destructor function (which is just any function that consumes the value and does not return a new one).

Using affine or linear types can get quite cumbersome, as any transformation requires storing the result in a new variable:

let x = f();
let x1 = g(x);
let x2 = h(x1);
...

To make substructural types a bit more palatable there’s the idea of “borrowing”. In Rust that is handled with references, which complicates things, so the example below uses an explicit made up “borrow” keyword:

fn increment(borrow x: Vec<i64>) {
  // increment all values of x
}
let y = vec![...];
increment(y);

Is theoretically equivalent to:

fn increment(x: Vec<i64>) -> Vec<i64> {
  let x1 = x;
  // increment all values of x1
  return x1;
}
let y = vec![...];
let y = increment(y) // new variable also called y

Though of course the compiler can just do an in-place mutation.

Rust, Clean, Mercury and Austral all use substructural types, but only Rust and Austral use them for memory management. An added benefit of substructural types is that they also prevent data races at compile time.

Lifetime Annotations

Finally we get to how references can be made safe. The easiest solution is to just remove them from the language entirely, as ParaSail and Hylo do. You can get surprisingly far without first-class references.

But what if we really want references plus safety, what can be done? Rust solves this issue with lifetime annotations. The idea of tracking lifetimes came from attempts at making region-based memory management safe to use, but in Rust they’re generalized to work with any allocation.

The basic idea is simple, when you create a reference to some allocation, add an extra tag to the type of the reference with the lifetime of the allocation, for example:

let x = vec![...];
let y = &x;

The types in the example above are inferred, but if we write them out, we see that there’s an extra type parameter to the reference, a lifetime annotation:

let x: Vec<i64> = vec![...];
let y: &'a Vec<i64> = &x;

That little ‘a is tracking y’s lifetime. It doesn’t actually store anything, it’s just a type variable that gets unified in the type system. For example, a function like:

fn get<'a>(array: &'a [Stuff], index: usize) -> &'a Stuff {
  ...
}

The type variable ‘a is tracking the lifetime of the slice, and records in the type of the returned Stuff reference that it has the same lifetime. This is achieved through unification, the same algorithm used to handle generic type variables.

Rust’s syntax is not the easiest to follow when it comes to lifetimes, the above function wouldn’t normally even have that ‘a explicitly written out since Rust can figure out the annotations on its own for cases like the above with only 1 reference.

So instead we can look at Austral since its references are always explicit:

let buf: ByteBuffer := allocateBuffer(100, 'a');
let len: Index := length(&buf);
destroyBuffer(buf);

The above looks very similar to Rust and works the same except for the explicit destructor. But if we look at the length function, things become a bit clearer:

generic [R: Region]
function length(buf: &[ByteBuffer, R]): Index is
  return !(buf->size);
end;

The length function is generic over a region called R, and every reference in Austral must have an associated region6. Inside of it we use the path operator (→) to get a reference to the size field of the buffer (of type &[Index, R]) which we then convert into an Index with the dereference operator (!).

But what if we want to store a reference to the ByteBuffer in another variable?

let bufref: &[ByteBuffer, ?] := &buf;

What do we fill in for the “?” there? In Rust, an implicit “region” is created (so to say) to fill that in, but in Austral we have to explicitly borrow buf:

borrow buf as bufref in R1 do
  -- bufref is of type &[ByteBuffer, R1]
end borrow;

The region is only valid in that scope, if we tried to store the reference outside of the scope, we’ll get an error:

let outerref: &[ByteBuffer, R1]; -- error: unknown region R1
borrow buf as bufref in R1 do
  outerref := bufref -- impossible since they cannot have the same type
end borrow;

What rust is doing is effectively the same as the above, only with a lot more syntax sugar to avoid needing to explicitly write out the lifetimes.

Conclusion

Ouff, that was a long article, but hopefully I covered all the main approaches to memory management. If there are any that I missed do let me know in the comments. I’ve heard of ASAP which is a sort of compile-time inlined tracing garbage collector but unfortunately the one implementation of the idea showed mediocre performance (which makes sense, a tracing GC is not something you want to run often, even in little spread out bits).

Personally, I think if we just gave up on first class references we’d have a really good memory management story that doesn’t require the massive sledgehammer that is a tracing garbage collector or nasty lifetime annotations.

Some bump allocators may allow you to “undo” the last allocation, or “pop” allocations like a stack, or let you record “snapshots” you can revert to, but the core idea remains to free everything in one go.

RAII is a much catchier name than SBRM, so it continues to be used as a shorthand, despite the actual words being effectively gibberish as a description of the approach.

It’s ok Bjarne, the idea is what counts, not what it’s called. And it’s a pretty neat idea.

This approach is also known as ownership semantics, because it’s tracking what memory owns what other memory, where “owner” means “responsible for freeing”.

No such thing, there are always considerations, but the closest we can get.

These are not the same concept of regions as in region-based memory management, where a region exists at runtime as an allocator. In Austral these are the same as lifetimes in Rust, a purely compile-time construct.

Burning the Midnight Coffee

Discussion about this post