Memory Unsafety is an Attitude Problem
No amount of technical improvements can make up for jagoffs.
On February 26th the White House released a press statement calling for software developers to take measures to stop introducing so many security vulnerabilities.
We, as a nation, have the ability – and the responsibility – to reduce the attack surface in cyberspace and prevent entire classes of security bugs from entering the digital ecosystem but that means we need to tackle the hard problem of moving to memory safe programming languages.
— National Cyber Director Harry Coker.
I’m honestly surprised it took this long for government to start getting involved, considering how long memory safety issues and the resulting security vulnerabilities have made a mess of things.
Some of the most infamous cyber events in history – the Morris worm of 1988, the Slammer worm of 2003, the Heartbleed vulnerability in 2014, the Trident exploit of 2016, the Blastpass exploit of 2023 – were headline-grabbing cyberattacks that caused real-world damage to the systems that society relies on every day. Underlying all of them is a common root cause: memory safety vulnerabilities.
— Anjana Rajan, Assistant National Cyber Director for Technology Security.
The online discourse quickly devolved into “just rewrite it in Rust bro!”1 which I find a bit unfortunate because it is a relevant report and the use of memory safe languages is just the main recommendation, other possibilities are given. I urge everyone to actually read the report and not just immediately jump to conclusions.
One of the most important statements in the report, I feel, is the following:
However, even if every known vulnerability were to be fixed, the prevalence of undiscovered vulnerabilities across the software ecosystem would still present additional risk. A proactive approach that focuses on eliminating entire classes of vulnerabilities reduces the potential attack surface and results in more reliable code, less downtime, and more predictable systems.
Extensive testing is not going to cut it, we need to build our software and hardware systems such that certain subsets of security vulnerabilities simply cannot happen.
But ultimately it is we (meaning software developers at large) that need to do this. It’s not better tools that will do it, it is we. The report suggests tools that can help us achieve this goal, but it is all for nothing if the tools are misused or ignored.
It all comes down to convenience
Humans are lazy creatures, we tend to take the path of least resistance. Most memory safe languages (including every language recommended in the report) have escape hatches that allow you to make as many memory unsafe horrors as you want. But that way of doing things is less convenient than the memory safe way of doing things in those languages, and as such is rarely an issue.
You can have 🔥 blazing 🚀 fast memory vulnerabilities in 100% safe rust 🦀.2
Here, have a double free in C# without even using the unsafe keyword:
static void Main() {
IntPtr ptr = Marshal.AllocHGlobal(4096);
Marshal.FreeHGlobal(ptr);
Marshal.FreeHGlobal(ptr);
}
Nobody in their right mind would do this, but they can.
So it’s not so much the availability of memory unsafe features that is the problem, but how convenient they are compared to safe alternatives. The safe route can sometimes get painful in Rust due to the borrow checker, but Rust’s community has a very strong culture of avoiding unsafe code unless absolutely necessary, going so far as pestering library developers about it on github.
That culture is what keeps Rust developers from taking the path of least resistance and reaching for unsafe
3 after losing a single “fight” to the borrow checker.
C and C++ are not “memory unsafe” languages, they’re languages that lack built-in memory safe ways of doing the same job in a more convenient manner.
There’s not really much of a reason for this to be the case, beyond not enough people pushing for it versus the number of people that not only don’t care about security but are actively hostile towards even small safety-related improvements.
Do you even C bro?
All the way back in the prehistoric times of 2009, Walter Bright, creator of the D programming language and the Digital Mars C and C++ compilers, wrote an excellent article titled C’s Biggest Mistake.
That mistake, he argues, was conflating arrays with pointers, or more specifically arrays decaying into pointers when passed to functions, losing their size information in the process. And oh boy was he right.
Buffer Overflows are one of the most common exploits out there, and C and C++ are pretty much the only languages actually vulnerable to them in any serious capacity. C++ has taken steps to mitigate this problem with the introduction of std::string_view (C++17) and std::span (C++20), but these should have really been part of C++114.
Better late then never I suppose… Except that std::span’s index operator ([]
) is unsafe, and its .at()
method (which has been in std::vector and std::string since forever) is only coming in C++26. How did the entire committee forget the .at()
method? It boggles the mind. So many talks about safety at cppcon and then they do this.
What about C then? Well, all you really need to protect against buffer overflows is something like the following:
typedef struct slice_i {
int* data;
size_t len;
} slice_i;
#define idx(s, i) (i >= 0 && i < s.len ? s.data : (abort(), s.data))[i]
Just copy paste slice_i around and change the _i and the type of the pointer as needed. Wouldn’t it be nice if you didn’t have to write these structs out by hand or use that hideous macro? If only a very smart gentlemen had suggested a minor syntactical addition to C all the way back in 2009 that could be used to make slices:
int a[..]
Simple no? If you try to index one of these things you get bounds checking. If you need extra performance you can always just cast it to a pointer. So what did C devs think of Walter’s suggestion? Lets check out a reddit thread from 2020:
I would have said overloading the 'break' keyword.
All other complaints about C are just, "Why do we need to breath oxygen?" It's just part of the landscape.
— which_spartacus
Ah yes, needing break in switch statements is definitely not just part of the landscape.
... *rolls eyes*
Fat pointers are pointless. If you want a fat pointer.. *gasp* make a struct of an integer and a pointer!
— okovko
The point flew so hard over this person’s head that Walter got a home run.
Another problem that can only be solved by writing good code.
— p0k3t0
It was so simple all along.
have to disagree. Since C has no implicit bounds checking (for performance reasons), there's no point in having the compiler know the size of an array / pointer. If you, the programmer, need that information, you can just pass the length explicitly.
— BioHackedGamerGirl
If only there was a convenient way to choose between having bounds checks or no bounds checks depending on the performance requirements of a particular piece of code. Many Buffer Overflows happen on non-performance-critical codepaths.
Next up ... Assembly Language's Biggest Mistake!
— nahnah2017
I think you get the point. 11 years after that article came out, this is the attitude. What do you think will happen if these developers are forced to use Rust? They’ll just use unsafe everywhere, completely defeating the point.
These people are why we’ve had like 5 attempts at a “strxcpy” function over the years instead of proper string handling functions in the standard library.
Even Linux kernel developers, whom you’d think would mainly consist of the absolute top tier of C developers, and would be expected to be highly security conscious, apparently don’t know how to implement a string buffer.5
What about Use-After-Free?
Safety and security are a spectrum, one where reaching 100% is sadly next to impossible. Preventing buffer overflows in C and C++ is almost trivial and would have avoided ~20000 known vulnerabilities.
Before worrying about trickier memory management concerns, don’t you think that should be priorities 1 through 20?6
Sadly, after that White House report, we’re already seeing discussions about adding substructural typing a la Rust to C. One of the silliest cases of putting the cart before the horse in recent years.
I’m not saying adding an owner pointer annotation to C wouldn’t be a good thing (I’d love that actually), I’m saying that’s not what you should be worrying about for now. If you can’t even add a slice type to the language how do you expect substructural typing to get accepted by the security unconscious jagoffs exemplified above?
There are also other effective ways of defending against Use-After-Free. Hardened memory allocators like SlimGuard are a thing. They leave a lot of performance on the table to also protect against buffer overflows, which they wouldn’t need to do if buffer overflows weren’t such a major issue in the first place!
Would hardened allocators like SlimGuard be more popular if the performance penalty was reduced from no longer needing overflow protection? Perhaps.
Safe Arenas and Pools
Thing is, if you’re coding in C, you shouldn’t be mallocing and freeing that much in the first place. You should be using Arenas to group multiple lifetimes together and freeing everything in one go. The less frees in your codebase, the less chances of a use-after-free. Is it 100% safe and applicable in all cases? No, but would you rather have ~20000 vulnerabilities or ~2000?
You can make Arenas nearly 100% safe by taking advantage of memory page protection features and only reusing old pages after exhausting the entire 64-bit address space. The chance of an attacker being able to exploit one of those page reuses is basically 0%.
On macOS, for example, you can do:
madvise(arena_ptr, size, MADV_FREE_REUSABLE);
mprotect(arena_ptr, size, PROT_NONE);
On Linux you’d probably use MADV_REMOVE
and on Windows you’d use:
VirtualFree(arena_ptr, size, MEM_DECOMMIT);
Unlike with malloc the performance penalty of doing this on an arena is minuscule.
You can also use memory pools with generational handles if you need to allocate and free various objects of the same type with varying lifetimes in a 100% safe manner. Is it applicable in all cases? No, but would you rather have ~2000 vulnerabilities or ~200?
You see the point? Why are arenas and pools not part of the standard library if they would already help so much with the problem?
If you had slices, hardened arenas and generational pools, you could make C almost as “safe” as Rust7 through linter rules forbidding various unsafe features like pointer arithmetic, “decayed” arrays and direct usage of malloc/free without first setting #pragma unsafe
or whatever.
Memory Safety != Security
PHP is a memory safe language. For many years PHP-based websites were a security nightmare due to SQL injection vulnerabilities because developers were concatenating SQL strings with unsanitized user input.
No amount of language safety features can protect against that, what can is providing safe alternatives that ensure the SQL query is built in such a way that user input cannot possibly affect it. That’s what every language (including PHP) has these days.
Java, another memory safe language, was a security nightmare in the Java Applet days. Recently there was the Log4Shell vulnerability, which involved a combination of very bad design decisions resulting in (quoting wikipedia):
The vulnerability's disclosure received strong reactions from cybersecurity experts. Cybersecurity company Tenable said the exploit was "the single biggest, most critical vulnerability ever,"[18] Ars Technica called it "arguably the most severe vulnerability ever"[19] and The Washington Post said that descriptions by security professionals "border on the apocalyptic."[8]
Nothing can protect against a logging framework connecting to remote servers based on a format string beyond a culture of everyone and their mother screaming their lungs out against the addition of such a feature.
That’s not to say memory safety isn’t important, it is. But lacking security is ultimately an attitude problem. If developers don’t care about it, forcing onto them a memory safe language will not accomplish anything. They’ll screw up some other way.
First make them care, then make it easy for them to do the right thing.
It’s certainly not a bad idea to use Rust if it’s a good fit for your project, the issue is the rewrite part. Complete rewrites (specially those in a whole new language you lack experience with) aren’t cheap and can easily introduce new problems of their own.
At the time of writing, pretty sure this will be patched in the compiler eventually. Still, you could just use unsafe to introduce all the vulnerabilities you could possibly want.
As in, the keyword that allows dereferencing raw pointers in Rust. It’s the easiest way to dodge the borrow checker.
C++ has long since had std::vector and std::string which cover some, but not all, use cases.
If they know how to implement them, why are they using garbage unsafe functions to copy string bytes around? Every language with mutable strings implements them the same way: capacity + length + data. Seq_buf was close but has a pointless tracing field, just get rid of that. Why is there still ongoing discussion? This article is from October 26, 2023…
Using arenas and pools is a common approach in Rust to work around the tree-like structure imposed onto memory by the borrow checker.
I’m not suggesting anything crazy here.