Avoid Null Objects for Error Handling
They have their uses, but can easily lead to a mess if used naively.
There’s this idea in lower level programming circles that the best way to “handle” errors is to just return a harmless default value when any operation fails, and then just let the program do a whole lot of nothing with those values. The value is typically “zero” but that’s mostly due to the languages being used (like C) zeroing structs by default. This is often called Zero Is Initialization (ZII) as a pun on Resource Acquisition is Initialization (RAII) from C++.
Note that a default constructor achieves the same purpose as ZII1 (albeit less efficiently), the idea of returning a safe default on error is the bit I’m concerned with here, the zeroing itself is not relevant.
I think this is a silly way to handle errors, mostly stemming from C-language deficiencies. Zero being a useful default is a good idea, specially in C, that’s not what I have issue with. My problem is specifically with returning a default, “harmless” value (that just happens to be “zero” in this case) as the output of every fallible function, and continuing to do useless work with that value.
In Object-Oriented languages this would be an instance of the NullObjectPattern. Same idea, different coat of paint. I don’t like it.
Dealing with errors doesn’t require checking for them everywhere, nor to keep doing useless computations with a “harmless default” when an error occurs, even in C. There are situations where “null objects” are the most convenient solution to the problem, but much like exceptions using them everywhere just leads to a mess.
Zero being Useful != Errors being Useful
To make things a bit clearer I’ll use an example from the Our Machinery blog, which is where I first saw the idea of applying ZII to error handling. A lot of the article is about how much nicer using 0 as a default is in C, which I 100% agree with, so let us skip to the part I have a beef with. The example uses the following function:
uint32_t find_bone(const char *name);
This function returns the index of a data structure with information on a particular human bone, stored in a global array, given that bone’s name. So passing in “femur” would give you the index where the information on femur is stored.
If I pass in the name “fenur”, which is not a valid name for a human bone, the function has to do something about it. The suggestion in the blog is to return index 0, which happens to contain a special “no bone” case. That “no bone” is a “null object” to use OOP terminology. That means I can do:
uint32_t femur_id = find_bone("fenur");
float femur_length = bone_length(femur_id);
Getting the bone length will always work no matter what the output of find_bone is. In this case it’ll return 0, which is the length of “no bone”. Program doesn’t crash, control flow is kept simple, job’s done right? Not really no.
Garbage In, Garbage Out
Lets assume that find_bone
does nothing but return 0 (i.e., “no bone”) when a bone is missing, because if it does anything else (like record the error somewhere) then that’s the actual error handling part. We’ll cover that case later. For now, assume find_bone
does nothing besides return index 0, aka “no bone”, in case of an error.
This special “no bone” (a null object) is a valid bone in the sense that every operation on bones works on “no bone”. I can get the length of a “no bone”, I can get the position of a “no bone”, I can get the name of a “no bone”, I can render a “no bone”, whatever.
But I wanted the femur. The find_bone
function “handled” the error by returning garbage, and the rest of the program kept on going like nothing happened.
Garbage! From King Garbage! Of the Garbage Dynasty! Stupid dog, always bringing garbage into the house.
— Eustace Bagge
If I get the length of a “no bone”, which is not the length of a femur, that length is also garbage. Any calculation involving that length is also garbage. And so on and so forth. Some people think that because “no bone” is a valid bone, this somehow is not garbage data. I don’t understand their reasoning. NaNs are valid floats, but they’re unwanted garbage nearly all of the time.
If we assume that the job of the application was to render a human skeleton, then the resulting display would have oddly short and deformed legs. Some people consider this an acceptable outcome, they’ll talk about displaying a giant red ERROR2 texture or similar instead but it’s the same thing really.
You didn’t handle the error, you ignored it. Your program “works” in the presence of an error by producing results the user didn’t ask for. Sometimes a half-working state can be beneficial, a lot of the time it is not.
Deal with Errors, but Later.
While some people will unironically claim “null objects” as the be-all-end-all solution to error handling, most people are using them as part of a larger strategy. For an example of the latter see this article by Ryan Fleury: The Easiest Way To Handle Errors Is To Not Have Them3.
I mentioned before that the find_bone
function could have recorded the error somehow. Instead of just returning “no bone”, find_bone
will additionally store the information that “fenur” doesn’t exist in some error log.
Some other part of the program will later go through this log and display the error to the user. This allows you to write your whole program without adding error checks everywhere, and still deal with the error. The result is a program with much simpler control flow, which in turn makes the code a lot easier to follow.
For C in particular, because all the code always executes, that means also all the resource management code executes, which is extremely valuable4.
I can definitely accept that reasoning, but that is a workaround for a language deficiency. What you wanted to do is abort the “task” that caused the error, reverting the program to the point right before the task started.
The “work” the program did meanwhile didn’t really achieve anything useful for the user. Best case scenario it amounted to a lot of “NOP”-ing around, worst case there is now garbage state all over the place the user will have to cleanup somehow.
Avoiding the latter takes discipline. It’s not enough to just return “null objects” everywhere and pray it all works out, that results in a bad user experience.
My suggestion if you do this is to record the error right where it occurs, let the null object propagate until the end of the function, then check if an error occurred. Don’t let it propagate further than that. You can extend “a function” to a larger task but make sure no null objects escape that task by getting stored somewhere.
Additionally, make sure you add checks or asserts for these null objects in every IO function. You don’t want to be sending them over the network and such. Remember, we’re talking about the case where they’re “errors”, not just a nice default.
What are “errors” to begin with?
There are two major sources of “errors”: bugs (unexpected errors) and partial functions (expected errors). You can consider program state as an additional input to a function for the latter case, to keep things simple.
I have another article on the fault (bug) → error → failure pipeline5, but the basic idea is that a bug is one or more erroneous lines in the program. You can’t do anything about a bug after you’ve already launched the program. Those lines, when executed, will cause an error, which is some invalid data or state. You can’t do anything about that directly either, since you didn’t expect it. That data or state will propagate and transform until it manifests as some user visible glitch or trigger some assertion in the code, resulting in a failure.
Can’t do anything about glitches (the user will have to report them). The only thing you can do with a failure is a generic “crash handler” that attempts to restore the program to a previous valid state and records as much information as it can to help debug the source of the failure (i.e., to figure out the actual bug that led to it)6.
Unexpected Errors → Crash Handler. Moving on.
Partial Functions
A partial function is a function that is only defined for a subset of its input set. Division is the classic example, its result is undefined when 0 is given as the divisor.
On a computer we can’t just leave things “undefined” like that, we need to decide what happens when 0 is passed in. CPUs do different things for integer and floating point division, and as programmers we too have different options when dealing with any partial function:
We can restrict the function’s input set, for example by only allowing non-zero integers as input, and relying on the type system to prevent compilation if 0 is passed in7. Non-nullable pointers are a great example of this approach.
We can check if the input is within the subset for which the function is defined as a precondition or assertion, triggering a “panic” if a value outside of that subset is given. We’re treating a particular input as being a bug in this case, meaning the actual error handling will be done as with any other bug, with a crash handler.
We can expand the output set of the function and map the inputs for which the function is not normally defined to this larger set. This is your
Optional<T>
orResult<T, E>
style solution, or multiple return values as in Go8.We can map inputs for which the function is not normally defined to one or more existing members of its output set, preferably unused ones. Division by 0 could return 0 for example9, I think Elm does this.
Continuing with division as an example, these are the options in pseudo-code:
1. int div(int a, nz_int b) { ... }
2. int div(int a, int b) { requires(b != 0); ... }
3. int? div(int a, int b) { if (b == 0) return NONE; else ... }
4. int div(int a, int b) { if (b == 0) return 0; else ... }
All options are valid, there is no option that’s universally better, but some options are better than others in different situations.
Depending on the language you’re using some options might be rather cumbersome. Options 1 and 3 are pretty terrible in C for example. I guess that’s a good reason to go for 4, but that doesn’t make it a good thing, it just makes C bad.
Error Handling the Whinesalot Way
My suggestion is to do the following: Pick Option 1 (restrict the input set) if it’s easy enough to do. The typechecker is your friend, use it. If that just isn’t practical, pick between Options 2 (assertion) and 3 (extended output set) depending on the case. Division by 0 is invariably a bug (so 2), but I might pass a non-existent key to a hash-table to check if it is stored there (so 3). Option 4 should only be used for languages that can’t do 3 properly (like C), and it should be used as if it were 3. Meaning “no bone” is an error case and it should be handled immediately.
What do I mean by dealing with an error “immediately”? Does that mean showing an error message right away? No, it means aborting the erroneous task.
First, should find_bone
follow option 2 (assertion) or 3 (expanded output set)? It depends. Where are its inputs coming from? Is it only called in one place? Or in multiple places for different purposes?
Consider for example that it’s used to get data for some fixed set of bones in a help page. The bones are hardcoded and written by the programmer, if one is missing it’s almost certainly a bug. In this situation it’s more convenient for find_bone
to do option 2 since needing to check if a bone I know is there is actually there is just noise.
But what if the name of the bone is given in a search box by the user? They can make a typo, or pass in a real bone name that just so happens to not be stored in the application’s “bone database”. In this situation it’s more convenient for find_bone
to do option 3, so I’m reminded that I need to handle the situation of the bone being missing and provide the user with an appropriate message (thanks typechecker!).
You see this duality in many APIs, for example Python has both dict[key]
, which throws an exception if the key is missing (option 2) and dict.get(key)
, which returns None
if the key is missing (option 3).
Since Option 2 is always dealt with the same way, what matters is how to program with partial functions that use Option 3.
I (know/don’t care/don’t know) if I’m right
Sometimes an API only provides Option 3, but you “know” you’re passing in a valid value. In that case use whatever solution the language provides to assert this. You can use .unwrap()
in Rust for example, or try!
in Swift. I’m not a fan of these noisy methods/operators, I much rather have two separate functions, one that uses Option 2 (meaning crash on wrong input) and one for Option 3, which returns an error value.
Sometimes if something is missing, you don’t really care, as it can be safely replaced by a default. An example would be filling in a default avatar if a user doesn’t provide one. Hopefully you’re using a language that makes this easy to do, like C#:
var avatar = user.GetAvatar() ?? new DefaultAvatar();
Note that the default avatar might be a generated image, rather than the same image for every user. In other cases what makes a good “default” might be different, even for the same function, when used in different parts of the program. Having “zero” be the default is not necessarily the answer here.
The big red “ERROR” texture can also apply in this case, but only if that is an acceptable default texture when a texture is missing, it is not ok as a result if the texture was actually necessary for the task, as in the bone example.
The last case is where you are hoping the function succeeds, but you can’t know if it will a priori. This is the interesting one, sorry it took so long to get here!
Error Values vs Null Objects
If a function may return an error result, and that result cannot simply be replaced with a useful default (not a useless “null object”), then you need to do the following:
Abort the “task” being attempted, whatever that may be. A “task” may or may not be a larger unit than a single function. This doesn’t mean crashing the program, it means returning to the last working state.
Inform the user of the error, either immediately, or by recording it in some error set that will later be displayed to the user in bulk.
As an example, consider a type checker, which is actually one of the best fit problems for the “null object” solution10, since we typically want to collect multiple errors rather than just stopping at the first one (at least in an IDE setting).
To make the difference clear between my suggested error handling approach and the “null object” approach, consider typechecking the add operator with implicit conversions between floats and integers. We start with the “null object” version:
Type top(Ctx* c, Type a, Type b) {
// needed to prevent recording a mismatched type error for ERRORs
if (a == ERROR || b == ERROR) { return ERROR; }
if (a == b) { return a; }
if (a == INT && b == FLOAT) { return b; }
if (a == FLOAT && b == INT) { return a; }
record_mismatched_type_error(c, a, b);
return ERROR;
}
Type typecheck_add(Ctx* c, ASTNode* l, ASTNode* r) {
Type left_t = typecheck(c, l);
Type right_t = typecheck(c, r);
Type result_t = top(c, left_t, right_t);
// this check is needed to avoid redundant error messages
if (!result_t) { return ERROR; }
if (!(result_t == INT || result_t == FLOAT)) {
record_expected_type_error(c, result_t, "a numeric type");
}
return result_t;
}
Note how we must check anyway for ERROR
to ensure we don’t keep recording error messages that ERROR
is not a numeric type or whatever. That check could be done on the various “record” functions as well (and should be there instead), but it has to happen somewhere, and we need to remember to do so.
Either way, if we naively use null objects without any checks, the compiler output will be full of useless errors! With error values, we’d do the following:
OptionalType top(Ctx* c, Type a, Type b) {
if (a == b) { return some(a); }
if (a == INT && b == FLOAT) { return some(b); }
if (a == FLOAT && b == INT) { return some(a); }
record_mismatched_type_error(c, a, b);
return error();
}
OptionalType typecheck_add(Ctx* c, ASTNode* l, ASTNode* r) {
OptionalType left_t = typecheck(c, l);
OptionalType right_t = typecheck(c, r);
return_if_error(left_t, right_t);
OptionalType result_t = top(c, left_t.value, right_t.value);
return_if_error(result_t);
if (!(result_t.value == INT || result_t.value == FLOAT)) {
record_expected_type_error(c, result_t.value, "a numeric type");
}
return result_t;
}
In this case we never do computation with OptionalType
, all functions take Type
as input, OptionalType
is only ever used as a return type. Because of this we need to turn OptionalType
into Type
before using it. The typechecker makes sure we never forget to do this. If not for needing to record multiple errors, I wouldn’t even bother with OptionalType
, I’d just exit()
the moment I hit an error.
Additionally, no unnecessary computation happens. The top
function is never called if either the left or the right node failed to typecheck. In this small example it makes no difference, but it can have an impact in a larger application. Imagine instead of a typechecker we were writing a JSON parser, what use is there to keep going?
The important point is that we got additional type safety for little additional effort, and most of it is due to C not helping what so ever. Consider Rust instead:
fn top(c: &Ctx, a: Type, b: Type) -> Option<Type> {
if a == b { return Some(a); }
if a == INT && b == FLOAT { return Some(b); }
if a == FLOAT && b == INT { return Some(a); }
record_mismatched_type_error(c, a, b);
return None;
}
fn typecheck_add(c: &Ctx, a: &ASTNode, b: &ASTNode) -> Option<Type> {
let left_t = typecheck(c, l);
let right_t = typecheck(c, r);
let result_t = top(c, left_t?, right_t?);
if (!(result_t? == INT || result_t? == FLOAT)) {
record_expected_type_error(c, result_t?, "a numeric type");
}
return result_t;
}
Other than some extra noise with Some
and ?
, this is just as simple as the “null object” case where the check for ERROR
is hidden in the record functions (except here we don’t even need to do so). We’ll never end up with a “mismatched types: ERROR and INT”, we’ve structured the code in a way that makes it impossible! We cannot forget to handle the error, and neither can the intern.
If the language allowed implicit conversions between Type
and Option<Type>
, then we could get rid of those noisy and unnecessary Some(x)
, and if it implicitly added the “?” operator when attempting to convert Option<Type>
to Type
, then the code would be completely clean11.
The “null object” solution is not some novel discovery, it’s a workaround for poor error handling in your language of choice. If you use it, use it properly.
Default constructors can handle situations where zero is not a useful value. Pointers are a good example, you rarely want null, you want the pointer to point to the default of whatever type the pointer points to.
This is an example of the above footnote. A giant red error texture is not zero. A pointer to such a texture is not zero. You can only take “zero” = “default” so far.
I have many issues with the title of this article. Errors don’t disappear because your program can continue to function in their presence. Just because adding “42” + 0 in Javascript gives “420” instead of throwing an exception doesn’t make the program any better (I’d say it makes it worse because garbage data is worse than no data). That said, the issue is with the title of the article, not the article itself, which is much more nuanced.
Thanks for reminding me Boostibot.
Don’t start arguing about the names! They don’t matter! The idea matters!
Some people think you can handle all errors this way, they’re crazy and you shouldn’t listen to them. It’s meant as a last line of defense, not as the only line of defense.
Dependent Types and Refinement Types can encode this sort of constraint in the type system but they’re more trouble than they are worth in my opinion.
Exceptions straddle the line between options 2 and 3, which is why they’re a bad error handling mechanism. Panics as in Go or Rust are like exceptions but exclusively meant for option 2, which is why they’re not an issue in those languages.
I find it funny that some people recoil at this idea but have no issue with “no bone”. They’re the same thing to me.
I actually do this for my toy compiler, sometimes it’s simply the best option, treat everything I said in this article as being prefaced with “in general”. Just because I don’t find null objects a good default choice for error handling, doesn’t mean they aren’t useful!
Noting this idea down for my own little programming language 🧐.