Wednesday, November 18, 2009

Letting the compiler find the bugs

Lets start by looking at a bit of contrived C/C++ code. See if you can spot the (hopefully obvious) bug.
#include <stdio.h>

int main()
{
    float fieldLength = 120; // yards
    float fieldWidth  = 48.77; // meters
    float yardsInAMile = 1760; // yards per mile

    printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLength * 2) + (fieldWidth * 2))));

    return 0;
}

It probably shouldn't take you too long to realize that in the above code I'm mixing units. I'm adding yards to meters. While bugs like this are obvious in trivial code they can be difficult to track down in more complex software. What can we do to prevent bugs like this from creeping into code? One solution is to use coding standards to reduce ambiguity. For example you could decide to use metric units for all data in your project. You could also use 'typedefs' and variable naming conventions to make errors even more obvious. Consider the following:

#include <stdio.h>
typedef float yards;
typedef float meters;

int main()
{
    yards fieldLengthYards = 120; // yards
    meters fieldWidthMeters  = 48.77; // meters
    float yardsInAMile = 1760; // yards per mile

    printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLengthYards * 2) + (fieldWidthMeters * 2))));

    return 0;
}

By including the units in the variable names and types it makes the programmers intentions clear. It's also easier to spot bugs in code audits. For even more safety you could define Meters and Yards classes that encapsulate the floats and prevent mixing of types. The advantage of defining new types/classes is the compiler can now ensure that you don't mix types. The disadvantage is that you'll end up writing a class for each type, with lots of overloaded operators. Writing a class for every unit feels to burdensome for all but the most critical code (nukes, airplane firmware, surgery robots, etc.). The problem with the above 'typedef' solution is that C/C++'s 'typedef' really only defines an alias for the type. Both the type-checker and the compiler will treat 'meters' the exact same way it treats 'yards', we want the type-checker to treat 'meters' and 'yards' as distinct types and the compiler to treat them both as floats. Google's new system language Go lets you do just that. Lets reproduce the original bug in Google Go.

package main

import "fmt"

func main() {
    var fieldLength float = 120; // yards
    var fieldWidth  float = 48.77; // meters
    var yardsInAMile float = 1760; // yards per mile

    fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n", 
            26.2 * (yardsInAMile / ((fieldLength * 2) + (fieldWidth * 2))));
}

This has the same bug as the C/C++ code above. The code should be readable to a C/C++ developer, the only really "weird" thing is that the type follows the variable name. Now let's use Go's strong typing to let the compiler catch the bug for us.

package main

import "fmt"

type yards float
type meters float

func main() {
        var fieldLength yards = 120;
        var fieldWidth meters = 48.77;
        var yardsInAMile float = 1760; // yards per mile

        fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n",
                26.2*(yardsInAMile/((fieldLength*2)+(fieldWidth*2))));
}

Notice that all we did is define two new types 'yards' and 'meters' all which should act like 'floats', but should be treated differently by the type-checker. When we compile we get the following error:
invalid operation: fieldLength * 2 + fieldWidth * 2 (type yards + meters)

The most important part of the error is at the end where it tells us we're trying to add 'yards' with 'meters'. The type checker found the bug for us! So how do we fix it? We need some conversion routines. So lets add some methods to the types and fix the bugs.

package main

import "fmt"

type yards float
type meters float
type miles float

func (m meters) toYards() yards { return yards(m * 1.0936133) }
func (y yards) toMiles() miles  { return miles(1760.0 / y) }

func main() {
        var fieldLength yards = 120;
        var fieldWidth meters = 48.77;

        fmt.Printf("You must run around the perimeter of an American football field %f times to run a marathon\n",
                26.2*((fieldLength*2)+(fieldWidth.toYards()*2)).toMiles());
}

With the corrected program we see that you only have to run around the field 133 times instead of 136.6 times!

Go isn't the first nor the only programming language that allows you to encode units as types, but it's close enough to C/C++ for a good comparison. So what's the runtime overhead of the change? Well there are a couple of method calls (toYards(), toMiles()) where the original version did the conversions inline. The error checking happens at compile time because Go is statically typed so there's no runtime performance hit. Personally I'd much rather wait for my program to call a couple of functions than to run around an American football field 3.6 more times.

By carefully using Go's (or any other language with a good type system) type system you can spend more time writing code and less time tracking down bugs.

1 comment:

  1. Great post. Hope to see more about Go in your blog!

    Cheers.

    ReplyDelete