Mainstream programming languages use one or more of the following approaches (if any) to achieve memory safety:
Tracing Garbage Collection (GC for short)
Reference Counting (RC for short)
Linear/Affine Types (RAII +
std::unique_ptr
is similar)“Borrow Checking” (i.e., Static Lifetime Analysis)1
Each solution has different tradeoffs: GC requires a hefty and complex runtime to be both global and efficient; RC can’t handle cycles and has some performance overhead to track the reference counts; Linear/Affine Types/RAII impose a tree-like structure to memory; Borrow Checking is insufficient by itself and imposes quite a few constraints that can be hard to understand (so called “fighting the borrow checker”)
But there is actually one more approach2 that could be used: Pointer Tagging. Often discussed in the context of hardware solutions like CHERI, pointer tagging can also be implemented in software with a reasonable performance cost.
The basic idea is to add some extra tag bits to a pointer such that we can determine if the memory address it points to is still valid. If the tag bits in the pointer and the tag bits of the allocation don’t match, then the data at that memory address has been freed and overwritten, and we have therefore caught a Use-After-Free.
But how do we produce and manage these tag bits? Read on.
Generational Handles
The idea of pointer tagging is similar to a common approach to track slot reuse in object pools: generational handles.
When you add an object to a pool, you get back a handle: an index to the slot where that object is stored. You can use this handle to grab the corresponding object and work with it. But what happens if the object has been freed and a new object was added to the same slot in the pool? How do you know if the handle remains valid?
The simple solution is to track the “generation” of the object in the slot. Whenever an object in a slot is deleted, that slot’s generation counter is incremented, so the next object placed at that slot has a higher generation count.
Instead of returning just an index as a handle, the pool returns an index + generation pair. When accessing a slot using a handle, its generation value is compared to the generation value of the slot. If they don’t match, the handle is no longer valid.
If generation counts are allowed to wrap around, then “generation collisions” become possible, but such collisions are extremely unlikely given enough bits for the generation counter.
To give some perspective, if you do nothing but increment a 64 bit number every clock cycle on a 3 GHz CPU, it will take 97 years to overflow.
Generational References
Generational References, a term coined by the developer of the Vale programming language, extend the idea of Generational Handles to arbitrary pointers. They’re a form of pointer tagging inspired by generational handles, hence the name.
An implementation similar to the approach used for generational handles requires a custom memory allocator since we need to manage the generation counts separately from the allocations. But there’s another really clever solution that was implemented in Vale that avoids this issue, remaining compatible with malloc, and it is the one we’ll use: random tags.
tag_ptr<T>
We’ll implement Tagged Pointers / Generational References in C++ using a smart pointer class we’ll call tag_ptr
. These smart pointers don’t actually do any memory management themselves, like std::unique_ptr
or std::shared_ptr
, they only provide memory safety.
The idea is as follows: A simple thread-local3 random number generator is used to produce a stream of 64 bit numbers to use as tags.
A function make_tagged<T>
allocates space for a T plus an 8 byte header. A random number is generated and placed in the header as the tag. It then returns a tag_ptr<T>
which stores a pair of a raw pointer to the allocated T and the tag.
When tag_ptr<T>
is dereferenced it compares its tag with the tag in the allocation. One of 4 things can happen:
The memory was freed and returned to the OS, causing a segmentation fault.
The tags match and the pointer is valid.
The tags don’t match, so we caught a Use-After-Free and abort the program.
The tags match but this is actually a collision. For whatever reason the exact same value as the tag was written where the old header should be.
The last situation is extremely unlikely. There’s around a 1/(2^64) chance of collision which is an absurdly small number (around 0.000000000000000005%).
C++ Implementation
First, we need the random number generator. We’ll use xorshift* with 64 bits of state and 64 bits of output because it produces values that are well distributed, it is very efficient, and it never outputs 0.4
That last bit, never outputting 0, is typically a downside for a random number generator, but it is good for us, since 0 is by far the most common value for memory to have and we can clear a tag in an allocation header by simply setting it to 0.
Here is xorshift* (adapted from Wikipedia)5:
#include <cstdint>
using namespace std;
uint64_t generate_random_tag() {
thread_local static uint64_t x = 1;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * 0x2545F4914F6CDD1DULL;
}
To tag data we’ll use the following struct:
template<class T>
struct tagged {
uint64_t tag;
T data;
template<class... Args>
tagged(Args&&... args)
: tag(generate_random_tag())
, data(std::forward<Args>(args)...) {}
~tagged() {
tag = 0;
}
};
This struct forwards arguments to the constructor of whatever type it is tagging and generates and stores a random tag. All the destructor does is zero out the tag.
This struct could be used directly (to enable tagging of stack allocated objects for example) but it’s mainly meant to be used by the following function:
template<class T, class... Args>
tag_ptr<T> make_tagged(Args&&... args) {
tagged<T>* alloc = new tagged<T>(std::forward<Args>(args)...);
return tag_ptr(*alloc);
}
This function is analogous to std::make_unique
/std::make_shared
, in that it allocates the data on the heap and returns it wrapped in a smart pointer. In this case a tag_ptr
, our tagged pointer / generational reference implementation:
#include <cstdlib>
template<class T>
class tag_ptr {
T* ptr;
uint64_t tag;
bool is_valid() {
char* raw_data = reinterpret_cast<char*>(ptr) - sizeof(uint64_t);
uint64_t source_tag = *reinterpret_cast<uint64_t*>(raw_data);
return tag == source_tag
}
public:
tag_ptr() = default;
explicit tag_ptr(tagged<T>& t)
: ptr(&t.data), tag(t.tag) {}
T& operator*() {
if (!is_valid()) {
abort();
}
return *ptr;
}
void destroy() {
if (!ptr) {
return;
}
if (!is_valid()) {
abort();
}
char* raw_data = reinterpret_cast<char*>(ptr) - sizeof(uint64_t);
tagged<T>* tagged_data = reinterpret_cast<tagged<T>*>(raw_data);
delete tagged_data;
}
};
Our tag_ptr
class stores a raw pointer to the tagged data and a copy of the tag. The dereference operator compares the stored tag to (what should be) the tag of the allocation and it aborts the program if they don’t match.
Note: There is no destructor since the tag_ptr
does not own memory! A destroy method must be explicitly called, which means memory leaks are possible.
A possible solution to avoid memory leaks is to allow tagged<T>
to exist independently from tag_ptr<T>
, and to have tagged<T>
do the memory management using all the standard RAII stuff.
If you choose this route, get rid of the destroy
method and the make_tagged
functions, and add a convenience method to tagged<T>
to produce tag_ptrs
:
template<class T>
struct tagged {
...
tag_ptr<T> get_ref() {
return tag_ptr(*this);
}
};
Tradeoffs of tag_ptr<T>
Compared to other smart pointers, tag_ptr<T>
has advantages and disadvantages.
The main advantage (and it is a pretty major one), is that it is a “trivial type” to use C++ standard terminology. That means it can be bitwise copied freely, e.g., using memcpy
. Its copy constructor has no custom logic.
The second advantage is that they have no problem dealing with arbitrary graphs, unlike std::shared_ptr
(which imposes a DAG-like structure) or std::unique_ptr
(which imposes a tree-like structure).
The major disadvantage is the space cost. Each copy of a tag_ptr
is twice the size of a regular pointer. You’ll want to keep the amount of tag_ptrs
to a minimum. The 8 byte header isn’t very substantial unless you’re heap allocating lots of small structs.
The second disadvantage is the performance cost. Allocations are slightly slower because of the random number generation, while dereferences require an extra branch to check for the tags. The branch should be predicted correctly every time so the cost is pretty minimal. You should also never pass tag_ptrs
as function arguments unless the function might delete the data, you should take a const reference to the data directly instead (so the check happens only once for that call).
Something that is neither an advantage nor disadvantage is that tag_ptrs
don’t manage memory. They’re more like a runtime equivalent of lifetime analysis. While this means you need to use another technique to manage memory, it also means they can be used with other techniques!
They work pretty well with memory arenas for example, the only constraint is that you can’t just reset the arena pointer, you must clear the memory with 0.
Weak Tagged Pointers
One thing that is a bit unfortunate about our tag_ptr
implementation is that we can’t safely check if they’re valid before dereferencing them.
If the pointer is valid or the data has been overridden, it’ll work. But if the page was released back to the operating system at some point or its read permission removed, trying to compare the tags will cause a segmentation fault.
With the standard allocator there isn’t much we can do about this6, but if we’re in control of the allocator there’s a simple trick that can be done.
Instead of unmapping a page when it is no longer in use, you can tell the kernel you don’t need it anymore. On Linux you can use the following:
madvise(page_address, page_size, MADV_DONTNEED);
This releases the page and any attempt to read the page will result in a zero-fill-on-demand page. The address range of the page is still considered “in use”, not a big deal with a 64 bit address space but something to consider. The allocator should store the page in a freelist and reuse it at some point instead of just calling mmap
.
Using MADV_FREE
instead of MADV_DONTNEED
means the page is only released when there is memory pressure.
The windows equivalent of MADV_FREE
is MEM_RESET
on VirtualAlloc
. To replicate the behavior of MADV_DONTNEED
requires decommiting and re-commiting the page.
With this in place, we can safely call the is_valid
method on tag_ptr
to check if the data it points to remains valid before dereferencing it.
Weak tagged pointers also allow us to throw an exception or return a default value instead of segfaulting when dereferencing, if that’s relevant.
Conclusion
Tagged Pointers are a pretty simple approach to achieve memory safety7 with a relatively minor performance impact. They have advantages and disadvantages compared to shared pointers and unique pointers, but their ability to deal with arbitrary graphs makes them highly complementary.
Using a custom memory allocator that marks pages as not needed instead of unmapping them allows tagged pointers to become weak tagged pointers, meaning you can check their validity before accessing them.
Weak tagged pointers are a much more efficient alternative to regular weak pointers (the sort often used in tandem with shared pointers).
The next time someone complaints about your use of a “memory unsafe” language, just ask them if they’re ok with a few % performance loss in exchange for safety. If calling .clone() in Rust or using Rc is considered acceptable, so should this.
It’s not just Rust, C# also uses this to make Span<T> safe.
Well, two, if we count Raphaël Proust’s rather bonkers compile time garbage collection approach. In practice performance isn’t where it needs to be but the idea is really clever.
We’ll ignore multithreading challenges in this blog post for simplicity, but it’s not too hard to design a thread-safe variant.
If you are worried about actual cyber attacks, using xorshift* with 128 bits of state will do the job. However, this does mean 0 is now a possible output, so you need to take that into account when clearing out tags.
You may want to seed the generator instead of always using 1 as the seed.
Attempts could be made with a signal handler to change the protections of the page when a SIGSEGV occurs, but whether that is safe to do is highly operating system dependent and very much not recommended.
Assuming you’ve separately handled bounds checks of course. Just write your own std::span and std::vector alternatives and index with [] like a normal person.
Generational references are great! (I especially like that they're a pretty close reification of temporal memory safety: pointer/allocation + generation number vs pointer/allocation + abstract allocation id!) However, I think you missed their largest downside: unlike the more mainstream tools for guaranteeing temporal memory safety, they don't guarantee that live references are valid — they guarantee safety, but not liveness! Tracing GC'd and RC'd systems guarantee liveness trivially by construction, and substructural types and borrow checking more complicatedly guarantee liveness through type-system-like tools; generational references guarantee no use-after-free by just making all dereferencing fallible.
Again, generational references are cool, and enforcing memory safety with assertion-shaped things should be fine for anyone who uses runtime bounds-checked arrays over dependently-typed arrays or techniques like https://okmij.org/ftp/Computation/lightweight-static-guarantees.html . This is just the major thing to keep in mind when considering them, in a world where the established alternatives can't fail in valid programs.