Golang and the Nil Bug in the Compiler

How does Golang treats nils at the machine level code?

July 20, 2024

Recently at my team, we faced the same ol’ billion dollar problem in our codebase, and unfortunately not for the first (or even the last) time — a story. The following code panics. Why? We’re clearly passing a nil through the call chain, is the panic due to a bug in Go?

// Run it on https://go.dev/play/p/PAxw_51EEgB

func innocent() *MyImpl   { return nil        }
func naive()    MyService { return innocent() }

func main() {
    v := naive()

    fmt.Printf(
        "type=%v, value=%v, isNil=%v\n",
        reflect.TypeOf(v),
        v,
        v == nil
    )

    if v != nil {
        panic("but that's impossible?!")
    }
}

Let’s dig deeper. In Golang, an interface reference may turn into a fat pointer. It looks something like this:

// Hypothetical; not real code from go's compiler implementation.
// It is internal to Golang and we usually don't see it
// in our code (usually).
struct FatPointer {
   // The type definition of ActualValue
   UnderlyingType *StructType,

   // Plain old c++ pointer to a value, most likely on the heap.
   ActualValue unsafe.Pointer,
}

You can verify that by disassembling the final binary (lines 359 & 361). In our panicing code, when we say v := naive() then v could implicitly become of type *FatPointer, not of type *MyImpl one normally expects. This is still not an issue. There’s nothing inherently wrong with a fat pointer as long as it’s transparent to us, and they are actually useful (more on this later). But the way Golang implements the equality check makes FatPointer opaque and leaks its existence to the developer. When we wrote if v == nil {...} we definitely meant if v.ActualValue == nil {...}, but nonetheless, Golang checked if the FatPointer pointed to by v was nil or not. And it was not. Compare ==’s behavior to function calls. One doesn’t go through any transformation while the other does:

v := naive()
// Translates to:
v := &FatPointer{
    UnderlyingType: MyImpl,
    ActualValue: unsafe.Pointer(nil)
}

// Directly translates to v == nil in machine code,
// and according to the line above, it's false.
v == nil

v.DoSomething()
// Translates to:
typ := *(v.(*FatPointer).UnderlyingType)
fn := typ.GetFunction("DoSomething")
fn(v.ActualValue)

Imagine if DoSomething() didn’t expect nil references, then a simple v != nil wouldn’t save the day and we see panics or worse we get bugs at runtime. So, how should we know if v.ActualValue is nil or not? We need to make a reflective call for that¹:

v := naive()
// Translates to:
v := &FatPointer{
    UnderlyingType: MyImpl,
    ActualValue: unsafe.Pointer(nil)
}

reflect.ValueOf(v).IsNil()
// Translates to:
v.(*FatPointer).ActualValue == 0x00 // which is true

But, can we always make that reflective call? Not so fast! *FatPointer is itself a reference, and can be nil too:

// Does turn into a fat pointer.
func naive0() MyService { return innocent() }   

// Doesn't. Actually, it can't, even if wanted to.
// What should UnderlyingType be then?
func naive1() MyService { return nil }

func main() {
    reflect.ValueOf(naive0()).IsNil() // is false, as expected.
    reflect.ValueOf(naive1()).IsNil() // panics.
}

This is because reflect.ValueOf() rejects any nil for *FatPointer. We can guard against it by one extra check:

v := naive()
v == nil || reflect.ValueOf(v).IsNil()

So far we limited our discussion to MyService which is an interface, but there’s also any, chan, interface{}, map, and so on. I won’t go into full details, but in summary, to cover them all, we need yet one more check (we take care not to call ValueOf() for anything that can’t be fat pointer, else it panics).

// I don't recommend using this isNil() defined here,
// if you know enough about v's type, in which case
// we can simply make the nil comparison if v is of
// a concrete type or make the following call if v
// is guarantied to be a fat pointer:
// v == nil || reflect.ValueOf(v).IsNil().

func isNil(v interface{}) bool {
   if v == nil {
      return true
   }

   switch reflect.TypeOf(v).Kind() {

   // The following list is AI-generated,
   // I wouldn't trust it with my life.
   case reflect.Ptr,
        reflect.Map,
        reflect.Array,
        reflect.Chan,
        reflect.Slice:
      return reflect.ValueOf(v).IsNil()

   default:
      return false
   }
}

Alternatives

There’s a moto from Python that I quite like:

$ python3 -c 'import this' | awk 'NR == 4'
Explicit is better than implicit.

In Go’s case, it could have been asking developers if they wanted their interface reference to turn into a fat pointer, hence marking them as such as part of the function’s signature. Would it make sense? Probably not. Go is garbage-collected, and such a level of detail does not fit into the language:

func f0()                   *MyType `pointer:fat` { return nil }
func f1(f func() MyService) MyService             { return f() }
func main() {
   v := f1(f0)
   // At this point, is v a fat pointer as f0 requested, or was
   // that changed as f1 didn't care to annotate the return type?
}

It could be worse if the annotation was part of the function’s return type: f0 would be incompatible with f1’s argument. We’d have the same function coloring problem as in sync/async languages, but instead for on-heap/on-stack values, defeating the purpose of a GC. Perhaps we could strive for consistency instead and change the equality operator’s implementation to match function resolution: a function call on a fat pointer is delegated to the ActualValue contained in it, and just alike, a comparison to nil happens against the ActualValue, not the fat pointer itself. Is it helpful? Not necessarily:

type MyBehavior interface { func NilSafeBehavior() }

type MyType struct { }
func (m *MyType) NilSafeBehavior() {
    fmt.Println("I run, regardless of my receiver being nil or not")
}
func mkMyType() *MyType { return nil }

func decision() MyBehavior {
    switch rollADice() {
        // returns a
        // &FatPointer{UnderlyingType: MyType, ActualValue: nil}
        case feelingGoodToday: return mkMyType()  

        // ... return nil, really. Or
        // &FatPointer{UnderlyingType: nil, ActualValue: nil},
        // if you must.
        default: return nil         
    }
}

func main() {
   v := decision()
   if v != nil {
        panic("can't happen in this hypothetical go implementation")
    }

   // When `==` on a fat pointer is always delegated to the
   // ActualValue, then at this point, how would I know if v
   // is absolutely nil and v.NilSafeBehavior() panics, or v
   // has a UnderlyingType and v.NilSafeBehavior() works?
   // Reflection could help:
   if reflect.IsFatPointerItselfNil(v) {
       fmt.Println("Nothing to run for today, better luck next time")
   } else {
       v.NilSafeBehavior()
   }
}

Again, we have to fall back on reflection, maybe even more with this hypothetical implementation.

As you can see, all the implementations (explicit markers, nil-equality compares against ActualValue, nil-equality compares against fat-pointer itself, and some more that I can think of²) are ugly! Language design is hard and full of trade-offs³. In any case, with the Golang available today, we need to sprinkle some reflective IsNil() all over our code.

Closing Thoughts

A few colleagues pointed me to the Golang best practices where it said something in lines of the following:

Try to accept an interface, not a struct.

Never return an interface, return concrete types.

They suggested by following these principals we will not face the nil problem we discussed. But I see a big problem with this advice: Asking developers to be careful has never worked out well. Either compiler catches the problem before it finds its way to the binary, or we’re doomed. In Golang’s case with nil it’s very easy to miss it in multiple layers of indirection that is not unusual in bigger code bases or when multiple teams work on the same code base or both! Even a linter can not enforce these principals, here’s a contrived example:

// https://go.dev/play/p/q3ZFrnNI_Lf

type MyFace interface{ DoSomethingNonNil() }
type MyImpl struct{}
func (m *MyImpl) DoSomethingNonNil() { if m == nil { panic("I don't like nils") } }

func take(v MyFace) { if v != nil { v.DoSomethingNonNil() } }

func give(isNil bool) *MyImpl {
	if isNil { return nil } else { return &MyImpl{} }
}

func dumbMiddleware(isNil bool) { take(give(isNil)) }

func smartMiddleware(isNil bool) {
	if v := give(isNil); v == nil {
        take(nil)
    } else {
        take(v)
    }
}

func main() {
	// f := smartMiddleware
	f := dumbMiddleware
	f(false)
	fmt.Println()
	f(true)
}

As suggested, take follows the advice by accepting an interface; give follows the advice by returning a concrete type; yet, the code panics. The fix is this eyesore:

// My brain immediately asks why don't we just
// inline `v` to `take(give())`? And I have
// to remind it of the semi-hidden FatPointer.
if v := give(); v == nil {
    take(nil)
} else {
    take(v)
}

In real life it’s worse. Dealing with some gRPC servers, open-api generated clients and such, we can not necessarily enforce these principals and have to work around it by reflection. The situtation is exacerbated when everyone is retuning error from their functions completely ignoring the advices (including the Go’s built-in libraries):

// https://go.dev/play/p/CiKxaAo6r5J

func check(err *error) {
	if err != nil { 
		panic("and now my firend, we're doomed; I have to blow up the rocket")
    }
}

func main() {
    _, err := fmt.Fprintf(
       os.Stdout,
       "signaling the rocket's trajectory monitoring "+
         "subsystem, making sure we have not deviated "+
         "from the path\n",
    )
	check(&err)
}

Of course, we can add one more to our advices (and linters): never pass error by reference. But it’s just the rabbit hole is getting deeper. To me, having type level behaviors and nil references don’t go hand in hand in a garbage collected language.

Thanks for reading!

I would argue that using runtime reflection either means the language has a design flaw or we’re doing it wrong. For a library developer, it’s probably the former, and for an application developer, it’s probably the latter. A bold claim indeed, but I believe it leads to safer languages. ↩︎
For instance Go could opt for sum types and require nullable references to be explicitly marked as such. If it fits into the rest of the language or if it helps at all is another story though. ↩︎
For me the best trade-off here would’ve been not including the possibility to call a method on a struct with a nil receiver, eliminating the problem altogether. But it does change the current programming style in Go considerably. ↩︎