Friday, August 21, 2009

Getting to the Bottom of Nothing At All

Except when I'm being a total smart ass or self indulgent emo telling the Ruby community to f' off, most of what I write talks about practical stuff, often with a small dose of theory. But this time it's about theory with a small dose of practice.

Last time I talked about the way programmers in popular C derived languages C++, Java, and C# think of a void function as being one that returns nothing. I contrasted this with the functional view that such functions return something, they just return the unit value (), something that doesn't carry any information. To expand on this I want to talk about actually returning nothing.

In any Turing complete language it is possible to write a function that really truly never returns anything, not even a made-up singleton value like (). In C style languages the simplest such function might contain an infinite loop like

while(true);
For my purposes a better way to explore the idea is with infinite recursion. Pretend your favorite C style language does tail call optimization[1] so we can ignore stack overflow issues and examine the following
X foo(){ return foo(); }

The question is what should the X be? And the answer is that if you plug that line of code into C, C++, C#, or Java then X can be just about any type you want[2].

If that doesn't give you pause then consider this: your type system appears to be lying to you. The function is promising an X but has no hope of delivering. And its quite happy to promise you any X. If you write "String foo(){ return foo(); }" then your type system considers the expression "foo();" to have type String so that you can write "String myFoo = foo();". But guess what, foo doesn't return a String and myFoo will never get a String. Is the type system lying?

The answer is no, of course not. It's just proving something a bit weaker than you might think at first. Instead of proving that "foo() will compute something that is a String" it's proving that "foo() won't compute something that isn't a String." If your "Law of the Excluded Middle" sense just tweaked, then you're on the right page. You'd think there are only two cases to consider: either the function definitely computes a String or it definitely doesn't. But there's this third case where it doesn't compute anything, and since the type checker can't reliably detect non-termination (c.f. Turing Halting Problem) it has to do something a bit odd. First, it assumes that the foo() declaration is correct and does return a String. Then it goes looking for a contradiction such as returning an int. Inside the function the compiler sees that the return is the result of the expression foo(). But the compiler has already assumed that foo() returns a String. There's no contradiction between declared and result so all is happy. The word "tautology" should come to mind right about now. By assuming X is true the type checker has proved that X is true. This weakening is a practical consequence of the Turing Halting Problem and there are at least two good ways to think about it. But first some more examples of the phenomenon.

Exceptions and Exits and ...

I very casually suggested that we ignore stack overflow issues in the infinite recursion above. But exceptions are an important part of this story because they are, in some interesting way, related to non-termination. Consider this function (or its equivalent in your favorite language)
X foo(){ throw new RuntimeException(); }
Once again, X can be any type. And once again foo() does not in fact compute an X.

Clearly these two definitions of foo are different things. Non-termination means we're stuck whereas an exception means we might be able to recover and try something else, or at least notify that there's a problem and shutdown cleanly. None-the-less they both behave similarly. First, they hijack the normal flow of control in such a way that it never returns back to the caller. And second they are both examples of a kind of weakening in the type system since any type can be substituted for X. Formally they are both examples of functions that diverge (functions that return normally are said to converge).

The Type

Here's a third example of a diverging function in Java. Translate as necessary for your language (in Java System.exit(n) stops the program immediately and returns n to the calling process).
X foo(){ System.exit(1); return null; }

Yet another case where foo() won't compute anything, it diverges. In fact, the return statement after the exit call is dead code. However, this is example is slightly different than the other examples because we had to create the dead code for the compiler to be happy[3] and, in fact for a different "X" like int the return value might have to be changed. That bit of dead code is closely related to the heart of this article.

Java can detect some kinds of dead code. If you write "X foo() { throw new RuntimeException(); return null; };" then Java recognizes that the return is unreachable and complains. In my System.exit(1) example Java didn't recognize that the call to exit() will never return so it required a following return statement. Obviously "throw" is a keyword and can get special attention that a mere library function can't but it would be useful to be able to let Java know that, like throw, exit() diverges.

One of the best ways to tell a compiler how a function behaves is by using types and, in type theory, there's a type that expresses just what we need. The type is called bottom (often written ⊥), and while there are different ways to look at the bottom type I'm going to go with a subtyping based view that should be accessible to C++, C#, and Java programmers.

If a language has subtyping and a function says it will return a type "X" then the function is allowed to return a "Y" instead as long as Y is a subtype of X. In my example of a function that just throws an exception the return type could be anything at all. So if we wanted System.exit(1) to indicate that it diverges the same way throw does, then its return type should be a subtype of all types. And indeed, that's exactly what bottom is.[4] bottom is a subtype of String, and int, and File, and List<Foo>, and every other type. Usefully, conventional type hierarchies are written with supertypes above subtypes, which makes a convenient mnemonic for "bottom" which needs to go below everything on such a hierarchy.

Now, if you're used to OO thinking then you expect a value with a certain subtype to in some sense be substitutable everywhere that a supertype is expected. But how can any one object behave like a String, an int, a File, etc? Remember that bottom indicates divergence: an expression with type bottom can never compute a value. So if exit()'s return type was bottom it would be totally safe to write "String foo() { return System.exit(1); }" while another bit of code could have "int bar() {return System.exit(1); }."

Making it Useful, A Divergence to Scala Divergence

Occasionally it might be useful to indicate that a function diverges. Examples are functions like System.exit(1), or functions that always throw an exception, perhaps after logging something or doing some useful calculation to create the exception. But interestingly out of all the statically typed languages that have any following outside of pure research only Scala has an explicit bottom type, which it calls Nothing. The reason Scala has a bottom type is tied to its ability to express variance in type parameters.

For some reason a lot of programmers run screaming into the night when you say "covariance" or "contravariance." It's silly. I won't get into all the details of variance, but I will say that in Scala the declaration "class List[+T] {...}" means that List[Subtype] is a subtype of List[Supertype]. No screaming necessary. And List[+T] brings me to one extremely practical use of bottom - what type should an empty List have?

Well, an empty List can have type List[String] or List[Foo] or List[int]. T can be whatever. And what's a subtype of whatever for all values of whatever? You guessed it: bottom (Nothing). Indeed, Scala has one constant called Nil whose type is List[Nothing]: a subtype of List[String], List[int], and List[whatever]. It all ties up in a bow when you consider that a List[T] has a method called head which returns the first element of the list as type T. But an emtpy list has no first value, it must throw an exception. And sure enough, head in List[Nothing] has type Nothing.

C# 4.0 is supposed to be getting definition site variance similar to Scala's but using the clever mnemonic keywords "in" and "out". I haven't heard anything yet on whether it will also add a bottom type, but it would make a lot of sense.

Java has usage site variance using wildcards. You can say "List<? extends Supertype> x" to indicate that x can have a List<Supertype> or a List<Subtype>. The bottom type would be useful in Java, too, although not as compelling since wildcards are so verbose people rarely use them even when they would make sense. Plus, Java folk tend to mutate everything and List[Nothing] in part makes sense in Scala because Scala Lists are immutable.

C++ does not have any simple way to express this kind of variance so the bottom type is even less compelling in C++ than it is in Java.

Back On Track

Haskell and languages in the ML family don't have an explicit bottom type. Their type systems don't have subtyping so adding bottom as a subtype would confuse things. None-the-less, they do have a nice way to express bottom that can be teleported back to Java, C#, and C++ (but not C). Recall that bottom is a subtype of all types. Another way of saying that is that if a function returns type bottom then for all types A, the function returns something compatible with A so why not express that directly? In Haskell the "error" function takes a string and throws an exception.
Prelude> :type error
error :: [Char] -> a
In Haskell, a lower case identifier in type position is always a type parameter, and [Char] means "list of Char", aka String. So for all types a, if "error" doesn't diverge then it will take a String and return an a. That can pretty much be expressed directly in Java
public static <A> A error(String message) { throw new RuntimeException(message); }

or C++

class message_exception : public std::exception {
  public:
    explicit message_exception(const std::string& message) : message_(message) {}
    virtual ~message_exception() throw() {};

virtual const char * what() const throw() { return message_.c_str(); };

private: const std::string message_; }; template <typename A> A error(const std::string& message) {throw message_exception(message); }

And for either language, usage would be something like

int divide(int x, int y) {
  if (y == 0) {
    return error<int>("divide by zero"); // drop the "<int>" in Java
  } else {
    return x / y;
  }
}

Haskell also has a function called "undefined" that simply throws an exception with the message "undefined." It's useful when you want get started writing some code without fully specifying it.

Prelude> :type undefined
undefined :: a

The function isn't as interesting as the type, which promises that for any type a "undefined" can compute an a or diverge. Since "undefined" can't possible just produce a value of any arbitrary type, it has no choice but to diverge. The same idea can be added to Java

public static <A> A undefined() {return error("undefined"); }
or C++
template <typename A>
A undefined() { return error<A>("undefined"); }

In either language it might be used as

string IDontKnowHowToComputeThis(int input) { 
  return undefined<string>(); // again, make appropriate changes for Java
} 

Given the requirement to write the "return" keyword in C#, Java, and C++ I'm not sure how practical something like a generified error function really is as compared to having it return an exception and making the user write 'throw error("blah")', nor whether "undefined" is that much more useful than just throwing an UndefinedException, but this does illustrate the relationship between the bottom type and functions that, in Haskell terms, compute "forall a.a" or in C#/Java/C++ terms return the type parameter A without taking an A as an argument.

Also, as always care should be taken when transliterating from one language to another. Java would allow a function like "undefined" to return a null instead of diverging. C++ would allow it to return anything all and would only fail to compile if it were used in an incompatible way. That contrasts with languages like Haskell and ML in which the only way to implement "undefined :: a" is to make it diverge in some form or fashion[5].

The Bottom Value

I've spent some time talking about the bottom type as having no values. But it does have expressions like "undefined()" and that leads to a rather philosophical notion of how to think of bottom as having a value. Sorta. Skip this section if you don't like total gray beard philosophizing. If you're brave, then stick with me and imagine a subset of your favorite C derived language that does not allow side effects. No mutation, no IO, and no exceptions. In this strange world functions either just compute values or they diverge by not halting. In such a world the order of evaluation mostly isn't important as long as data dependencies are met - in f(a(), b()), you can compute a() before b() or b() before a(), whatever, they can't interfere with each other. Doesn't matter. The only time order of evaluation is forced on us is when there are data dependencies, so "a() + b()" must compute a() and b() (or b() and a()) before their sum can be computed. Nice.

Well, almost. Order of evaluation can matter for expressions that diverge. Let me give an example.

int infiniteLoop() {return infiniteLoop();}
int f(int x) {return 42;}
int result = f(infiniteLoop());

Because f ignores its argument, if we call "f(infiniteLoop());" there are two possible outcomes. If "infiniteLoop()" is eagerly evaluated before f is called then the program will diverge. On the other hand, if the expression "infiniteLoop()" is lazily remembered as being potentially needed later then f can successfully return 42 and the diverging expression can be forgotten just like it never happened (because it didn't).

We've gone to the pain of eliminating side effects from our language so it's a little irritating to have to keep thinking about evaluation order just to deal with divergence, so we perform a little mental trick; a little white lie. Imagine that functions like infiniteLoop above don't get stuck, they compute a value called ⊥, which is the only value of type bottom.

Now, since the bottom type is a subtype of all types and ⊥ is the bottom value, then it follows that every type must be extended with ⊥. Boolean = {⊥, true, false}, int = {⊥, 0, 1, -1, 2, -2, ...}, and unit = {⊥, ()}. That means we need some rules for ⊥ and how it interacts with everything else. In the vast majority of languages including Java, C#, and C++ but also impure functional languages like F# and OCaml doing just about anything with ⊥ can only compute ⊥. In other words, for all functions f, f(⊥) = ⊥. If you write f(infiniteLoop()) in those languages then the result is ⊥. This kind of rule is called "strictness".

In contrast, Haskell is often called a "lazy language," meaning that expressions aren't evaluated until they're needed. That's not quite technically correct. The Haskell spec just says that it is "non-strict." The spec doesn't care when expressions are evaluated so long as programs allow ⊥ to slide as far as possible. An expression like f(infiniteLoop) must evaluate to 42. Haskell basically forces an expression involving ⊥ to evaluate to ⊥ only when the argument must be used[6]. The distinction between "lazy" and "non-strict" is subtle, but by being "non-strict" rather than "lazy" a Haskell compiler can use eager evaluation anytime it can prove that doing so doesn't change behavior in the face of ⊥. If a function always uses its first argument in a comparison, then Haskell is free to use eager evaluation on that argument. Since Haskell truly does forbid side effects(unlike our imagined neutered language above), the choice of evaluation strategy is up to the compiler and invisible except for performance consequences[7].

C++, Java, and C# have just a tiny bit of non-strictness. In these languages "true || ⊥" is true and "false && ⊥" is false. If these languages were totally strict then "true || ⊥" would be ⊥. Users of these languages call this behavior "short circuiting" and it's done for performance reasons rather than being a philosophical goal, but it's still a curious departure from their normal rules.

There you have it. The bottom value ⊥ is a clever mental hack to allow purely declarative functional code to be reasoned about without injecting sequencing into the logic. It allows people to talk about the difference between a purely declarative strict language and a purely declarative non-strict language without getting into details of evaluation order. But since we're talking about languages that aren't so purely declarative, we can take off our philosopher hats and return back to the world where side effects are unrestricted, bottom is a type with no values and divergence means that flow of control goes sideways.

Tuples and Aesthetics

Last time I talked about the unit type and how, if you interpret types as logical propositions, the unit type behaves as a "true" and tuple types act as logical and "∧." I also talked about an algebraic interpretation where unit type acts like 1 and tuple types act like multiplication "×". So the type (A,B) can also be written as A∧B or A×B. The type (A,unit) is isomorphic to A, A×1 = A, and A∧True <=> A.

Bottom has similar interpretations as 0 or False. The type (A,bottom) is isomorphic to the bottom type because you can't compute any values of type (A,bottom). A×0 = 0, and A∧False <=> False. Nice how it all hangs together, eh?

Bottom behaves like False in another way. In logic if you assume that False is true you can prove anything. Similarly in type systems the bottom type allows the programmer to promise to do even the impossible. For instance, here's a Java function signature that promises to convert anything to anything.

public static <A,B> B convert(A arg) {...}
If you ignore dynamic casting and null (they're both weird in different ways) there's no meaningful way to implement that function except by diverging. More on this in an upcoming episode.

Conclusion

I somehow don't think anybody will be running into the office saying "I just read an article on the bottom type, so now I know how to solve our biggest engineering challenge." But the bottom type is still interesting. It's a kind of hole in your static type system that follows inevitably from the Turing Halting Problem[8]. It says that a function can't promise to compute a string, it can only promise to not compute something that isn't a string. It might compute nothing at all. And that in turn leads to the conclusion that in a Turing complete language static types don't classify values (as much as we pretend they do) they classify expressions.

Footnotes

  1. gcc does do tail call optimization under some circumstances.
  2. Oddly, in C, C#, and Java X can't be "void" because its an error to return an expression of type void. C++ does allow void to make writing templates a bit easier.
  3. Yes, I can make it a void function and not have a return. But again that would illustrate how exit() is different from a throw: throw doesn't require the return type to be void.
  4. Note I'm saying "subtype" here and not "subclass." int, List<String>, and List<File> are 3 different types. In C++, Java, and C# int doesn't have a class. In Java and C# List<String> and List<File> both come from the same class. Yet bottom must be a subtype of all 3 so it can't be a subclass.
  5. A plea to my readers: I'm too lazy to see if "forall a.a" is bottom in F# or C#, or if, like Java, null would be allowed. My bet is that null won't be allowed because only Java has the peculiar notion that type parameters must be bound to reference types.
  6. Very loosely speaking. Please see the Haskell Report for all the gory details about when Haskell implementations are allowed to diverge.
  7. Haskell has another clever hack where throwing an exception isn't an effect, but catching one is. Since order of evaluation is unspecified in Haskell, the same program with the same inputs could conceivably cause different exceptions. Exceptions are theoretically non-deterministic in Haskell.
  8. A language that isn't Turing complete can prove termination and does not need a bottom type, but such a language isn't powerful enough to have a program that can interpret the language.

17 comments:

  1. For the record, null is actually very close to a value of type Bottom. In fact, in Scala, null is the only member of the special type Null, which is a bottom type just as Nothing is. The only difference is that Nothing is a subtype of *every* type, while Null is only a subtype of every type which is a subtype of AnyRef (except for Nothing). Thus, you cannot do the following:

    val i: Int = null // fails

    However, you *can* do this:

    val i: Int = throw new Exception

    ReplyDelete
  2. Oh, and regarding [8], I'm fairly certain that System-F is powerful enough to write an interpreter for itself. At least, I can't presently think of any way to *disprove* that claim.

    ReplyDelete
  3. Null doesn't quite act as the bottom type, not just because of the no-primitives rule, but also because the null value doesn't behave properly for a bottom value. null == null is true and null == "hello" is false. By definition you can't compare things with the bottom value (even in Haskell) without getting the bottom value.

    System F can't interpret itself, nor can any normalizing calculus. See http://en.wikipedia.org/wiki/Normalization_property_(lambda-calculus) "A lambda calculus system with the normalization property can be viewed as a programming language with the property that every program terminates. Although this is a very useful property, it has a drawback: a programming language with the normalization property cannot be Turing complete. That means that there are computable functions that cannot be defined in the simply typed lambda calculus (and similarly there are computable functions that cannot be computed in the calculus of constructions or system F). As an example, it is impossible to define the normalization algorithms of any of the calculi cited above within the same calculus."

    ReplyDelete
  4. I find it a little surprising that System F cannot interpret itself. The type system by itself is sufficient to prove that all System F programs are terminating. Thus, any program which was to interpret a System F program could also be terminating.

    I accept that System F is not Turing Complete (obviously, since it is normalizing), but I find it hard to believe that the task of interpreting System F is in the narrow class of *guaranteed* non-terminating Turing Computable functions which cannot be computed by System F. I would be interested to see what line the proof follows for normalizing calculi with respect to their own interpretation.

    ReplyDelete
  5. I throughly enjoyed reading this article. Even for a lot of stuff I knew already, you made connections that never occured to me, which, in turn, made everything make more sense.

    ReplyDelete
  6. Hey James,

    Nice blog entry, as usual.

    A related note, Neal Gafter's closures prototype for Java had an Unreachable type that was later renamed to Nothing to match the name used by Scala.

    Best,
    Ismael

    ReplyDelete
  7. Thanks for another great post James!

    ReplyDelete
  8. Regarding empty lists you wrote, "what's a subtype of whatever for all values of whatever? You guessed it: bottom (Nothing)."

    There is an alternative, devised by Ken Iverson, where arrays are defined so that the empty array applies independently of the type of its (nonexistent) contents. This also facilitates things like having a "head method" not throw an exception when applied to an empty array.

    Clearly the empty arrays of APL, J, and other Iverson-influenced languages are not the same as bottom, but this does seem to be a related topic with similar practical applications. If any such comparisons are written, I'm eager to read them! (My search efforts have revealed none.)

    ReplyDelete
  9. Daniel...

    Regarding strong normalizing calculi interpreting themselves, I think you'll enjoy this post. It's a strikingly elegant argument.

    Interestingly, this point of view remains controversial. Wade through this, if you dare... There's a whole lot of junk there, but you'll find some gems if you search for Andris Birkmanis and Charles Stewart.

    ReplyDelete
  10. Excellent article, James.

    I never really thought of the notion that languages like haskell are still dependent on evaluation order unless you explicitly define how one ought to reason about methods that don't halt.

    This article seems to be targeted at scala/java/C# users, so there's a big bit you sort of skipped over:

    null.

    null SEEMS a lot like the instance equivalent to bottom; null is an object that can be assigned to any type (let's keep primitives out of this for a moment). Just like bottom is a subtype of every type.

    IF you take this as true, java does in fact have a type that sort of feels like it could take up the role of bottom: "java.lang.Void". There is no actual instance of that class; its constructor throws an exception, and to stop you from even trying, is also private. However, you can still return a value from a function that declares it returns the Void type:

    public Void noop() {
    return null;
    }

    is of course perfectly valid. And herein lies a problem; in generics you need to reason about both upper AND lower bounds. Even if Void is defined to be a subtype of everything, you cannot add a value that 'is the subtype of everything' to a list with signature List<? super Integer>. After all, we're looking for something that is an Integer, or a supertype thereof.

    In order to allow for this and a couple of other tricks, Neal Gafter's BGGA closure proposal includes a new type, named "Nothing". The BGGA proposal includes adding a provision to the JLS that the JLS carefully guard against using 'null' to create an instance (of sorts) of the Nothing type; I believe it defines that anytime null is attempted to be used as a value for Nothing, a NullPointerException should be thrown.

    It's one of those things that is at the basis of the general notion that BGGA is far too complicated.

    Having said that, I wish 'void' as a return type didn't exist, and 'Void' did, with the condition that just "return" (or ending a method with a non-return statement), in a method that is defined to return Void, is syntax sugar for 'return null;'.

    Or, even better, if you were to start from scratch, ditch null entirely, do create an explicit bottom type (can be named Void if this language is to stick to C-style nomenclature), and solve the lack of null with Eithers and pattern matching.

    Or, to really make matters complicated, make null part of the type system: Create the notion that "String" (which is defined as not capable of holding null) is a subtype of "String?" (which is defined as either a String, or null). This makes sense; any non-null string is definitely an instance of the type 'either String or null'. Just like a double would clearly be a subtype of 'double or int' - all doubles are neccessarily also 'double or int's.

    I've not quite sorted out how such a type system would interact with the concepts Void, Bottom, null, and (that inverse T symbol).

    ReplyDelete
  11. Speaking as a person who had to reason about strictness in the CPO category -- bottom is either too much or not enough :) AFAIK GHC isn't entirely correct in its behaviour with bottom, and the reason being it's not considered a side-effect like everything else. And the non-pure languages are happy as can be with the operational semantics that everyone understands and doesn't cause complicated equations to figure out the optimization rules. But of course I'm kinda bitter from the whole thing, so take this with a big grain of salt.

    ReplyDelete
  12. I wonder how, in C#, the default construction on generic parameters relates to the bottom type.

    ReplyDelete
  13. to Daniel Spiewak: System F cannot write an interpreter for itself. System F is strongly normalizing, and there is a diagonalization argument that shows that any strongly normalizing language cannot include a function which interprets that language.

    See http://www.haskell.org/pipermail/haskell-cafe/2003-May/004343.html

    ReplyDelete
  14. Have a look at continuations and CPS based systems where there are no returns.

    ReplyDelete
  15. Your blog is fantastic, Iry. By all means, do continue writing. In particular, the post "A Brief, Incomplete, and Mostly Wrong History of Programming Languages" is a masterpiece.

    ReplyDelete
  16. One minor quibble:

    In C, short-circuiting is _not_ only used for optimization. For example, if you were writing a function that takes a char* and tells you whether it's a string starting with a letter, the idiomatic way of expressing that would be "return s && isalpha(*s)", which is only correct because of short-circuiting.

    This is much less common when dealing with general pointers than char* strings, which means it's less idiomatic in the C-derived languages than C itself. And your larger point still stands; C programmers aren't thinking in terms of a general bottom just because they use this idiom. It's just the narrow aside that's a bit wrong.

    ReplyDelete
  17. System F cannot express an interpreter for itself because there is no way to express a recursive type in System F, and therefore there is no way to express a type that could represent any potential System F term that you would want to interpret.

    ReplyDelete