The Hidden Cost of Tracing Garbage Collection

You know about the pauses, but do you know about the rest?

Nov 13, 2023

Many programming languages provide some form of automatic memory management, also known as garbage collection(GC). Usually this means Tracing Garbage Collection, though there are alternatives with various tradeoffs.

Most programmers are aware that tracing garbage collectors can have unpredictable pauses, or that they tend to result in higher memory usage. This post is not about that. This post is about the supporting infrastructure needed by a tracing garbage collector to do its job, and the costs of that infrastructure.

If you’re making a programming language, particularly a lower-level one, these costs are important to keep in mind.

How a tracing garbage collector works

Feel free to skip this part if you already know how they work, I won’t be going into detail in this post, this is just about the basic idea.

When you allocate memory, you need to deallocate it at some point (unless the program is short lived). While buffer overflows are by far the worst offender when it comes to security vulnerabilities (pretty much a C-exclusive issue), “use-after-free” and “double-free” bugs are also very serious and affect any language that expects the programmer to manually manage their memory.

The job of a garbage collector is to handle deallocation for you. You can just worry about the problem you're trying to solve and trust the garbage collector do its job1.

A tracing garbage collector works by computing the set of reachable objects2, and freeing everything that isn't reachable. Anything not reachable is no longer accessible and therefore garbage data (hence the name garbage collector).

All root objects are considered reachable. An object is a root if there is a pointer to it stored on the stack, a global variable, or a CPU register. For every reachable object, if it stores a pointer to another object, that object is also reachable. A tracing garbage collector will mark every reachable object, then free every unmarked object. There are variations but they all find reachable objects and free the rest somehow.

What is the problem?

Why doesn't every programming language use a tracing garbage collector if they handle memory management so well? As stated in the intro, the main issue is unpredictability: tracing garbage collectors trigger at seemingly random times, causing latency spikes, which are unacceptable for certain kinds of applications.

Most languages with a tracing garbage collector, however, have knobs or APIs one can use to control the behavior of the garbage collector, and there are coding patterns one can use to minimize the work the collector has to do and therefore the amount and severity of the pauses.

So the pauses, by themselves, are not a good enough reason to not have a tracing garbage collector available. An application could just pause the collector and explicitly invoke it when convenient to avoid the pauses. The real issue is the infrastructure required to support the collector.

You can't just plop a garbage collector into an arbitrary language and call it a day. Depending on what features are expected of the collector, the language needs to do some work. To be more concrete, there are two main aspects to consider: how does the collector know about every pointer in the program, and how does the collector prevent the program from trampling over its work in a multithreaded environment.

Precise vs Conservative Garbage Collection

When garbage collection is triggered, the collector needs to somehow know about all the pointers in the program, so it can trace them to compute the set of reachable objects. Tracing Garbage Collectors can be classified as precise if they can accurately compute the set of reachable objects, or conservative if they can only compute an approximation. It's also possible for a garbage collector to be conservative for some pointers and precise for others.

Precision has a price, the information on what is a pointer and what isn't a pointer must be stored somewhere.

A conservative collector does not need any extra information. It will treat every word of memory on the stack, registers, and global variables as a potential pointer to the heap. If a number just so happens to coincide with the address of a valid heap allocation, it is assumed to be one, causing a memory leak (though hopefully a temporary one). That's the price of being conservative.

Conservative collectors like the Boehm Garbage Collector are often used with uncooperative languages like C and C++, because these languages not only provide no information on what is a valid pointer to the collector, but they also allow for direct pointer manipulation and arithmetic. A conservative collector is the only way to safely manage all memory automatically in an uncooperative language3.

Precise collectors need some support from the programming language. One way to do this is for the language to store pointers and other numbers differently, for example by using a marker bit. All pointers are stored as even numbers (least-significant bit 0) while all integers are stored as odd numbers (least-significant bit 1). For pointers this makes no difference as they will never be byte-aligned in a language that uses marker bits. For integers they lose a usable bit (going from 64 bits to 63 bits + 1 marker bit) and require additional bit manipulation with every arithmetic operation. This approach is used by OCaml.

The alternative requires storing some additional metadata about the types and functions in the program. For types, this will typically be a table with the offsets of the object's fields that contain pointers into the heap. For functions, this will usually be some sort of stack map that stores which arguments and local variables are pointers. To find the tables, each allocation needs to store a little header with the address of its corresponding type information. For the stack map, there are many possible implementations, one of the simplest being a parallel shadow stack that only stores pointers and gets updated at function entry and exit points.

Compacting collectors, meaning collectors that move objects around to reduce fragmentation and improve memory locality, must be precise in order to update the program's pointers to the new memory locations. Compaction is not safe to use with conservative garbage collection, the two features are incompatible.

So the options are: extra metadata with some stack maintenance and no raw pointers, marker bits with worse arithmetic, or potential memory leaks and no compaction.

Stop! Collector Time

Tracing the set of reachable objects is not a cheap operation. As the number of objects in the program grows, so does the time it takes to scan them. It would also be disastrous if another thread were to change pointers around while the garbage collector is doing its job. Because of this the garbage collector needs ensure all threads are stopped before it starts working4.

A tracing garbage collector that stops all threads, performs a full collection and then resumes the program is called a "stop-the-world" collector because the whole application is frozen while the collector is working. Most garbage collectors in use are incremental in the sense that they don't need to perform a full collection when triggered, they can resume the program to keep it responsive and continue the collection at a later point in time.

The need to "stop-the-world", at least for short periods of time, is a well-known cost of tracing garbage collection. The hidden cost relates to how the garbage collector actually stops the program. For a single-threaded program it's trivial, the program is "paused" simply because the garbage collector's code is executing instead. Where things get interesting is with multithreaded code.

How does the garbage collector stop all threads? Using operating system APIs to forcefully pause threads is a pretty bad idea. Instead, what is usually done is that the compiler inserts various safepoints into the code. Safepoints can be implemented as a check for some global garbage collector lock, or a write to a memory page that gets "poisoned" by the garbage collector, triggering a signal handler for the page-fault that the collector can then use to safely pause the thread. In the case of conservative collectors, they may need to wrap the thread creation APIs to record how many threads there are and to set the right signal handler masks.

Safepoints are typically introduced before every function call and inside each loop. That's extra code that needs to execute for the sole purpose of allowing the garbage collector to pause the program.

Incremental collectors (and Generational collectors) also need to maintain certain invariants to ensure the correctness of the collection algorithm. I won't go into those invariants here, but the important thing to note is that the program cannot be allowed to break them before the collector has finished its job.

To ensure this, cooperative compilers will insert a "write-barrier" on every pointer write, which is used to inform the garbage collector of the change, just in case it happens to break an invariant. Incremental conservative collectors will use page faults for this purpose as well.

So the options are: either your program is single-threaded and the collector is allowed to stop the program for an extended period of time, or the compiler needs to insert safepoints and write-barriers to ensure the collector can safely do its job, or the collector needs to abuse signal handlers and page faults.

Interoperability

I also haven't mentioned the issue of cross-language interoperability. Because the garbage collector needs cooperation from the language to function, the garbage collector needs to be informed whenever managed data crosses language boundaries. This is usually done by “pinning” and “unpinning” the relevant objects, which is a form of manual memory management.

It's also awkward to use code written in garbage-collected languages from other languages, as the host language may need to link with and initialize the other language runtimes (and runtimes with tracing garbage collection tend to be big). If the collectors abuse signal-handlers to pause threads, they may also trample over the host language’s usage of them.

This means that languages with a tracing garbage collector aren’t very good at making their code available to other languages, unless those languages share the same runtime (e.g. the various languages that run on the Java Virtual Machine). FFI is a major use case for languages like Rust and C++, so a garbage collector is something they’d rather avoid.

Conclusion

Beyond the well-known issues of unpredictable pauses and higher memory usage, tracing garbage collectors need a lot of supporting infrastructure for everything but the most naive of implementations. Either that or they need to be conservative collectors that go out of their way to work with uncooperative languages. This additional infrastructure is not an acceptable cost for lower-level code, even disregarding the unpredictability of garbage collection.

Because of this, there will always be programming languages that do not use tracing garbage collection. State-of-the-art tracing GCs simply require too much infrastructure that those languages are unwilling to provide.

Let me know in the comments if I got something wrong. I’ll cover alternatives to tracing garbage collector in a future post.

Even tracing garbage collection cannot avoid memory management concerns entirely. The program may (accidentally) hold on to a reference to some memory for longer than it needs to, causing a memory "leak" of sorts.

Note that object here refers to a piece of allocated memory on the heap, and not an object from Object-Oriented Programming.

Even a conservative collector can't help you if you scramble the value of a pointer, for example with XOR linked lists.

Feel free to go down the concurrent garbage collection rabbit hole, I won’t be covering that here. Just remember there is no free lunch, there is a lot of complexity involved in letting the world keep running.

Burning the Midnight Coffee

Discussion about this post