Sunday, August 26, 2007

Martians vs Monads: Null Considered Harmful

I hate null. More specifically I hate NullPointerExceptions(NPEs). Scala has an alternative, but let me explain why I hate NPEs. They're like evil, sneaky muppets. Remember the Martians on Sesame Street? They'd ask each other if a phone was a cow and then say "nope, nope, nope, nope." To my eye, NPE looks a lot like "NoPE."

Me: Is my unit test working?
Computer: NoPE
Me: Um...okay, fixed that problem...how 'bout now?
Computer: NoPE, NoPE, NoPE, NoPE
Me: ARGGHRHGR! You're a Sesame Street Martian!
Computer: Yip, yip, yip, yip1

Does all this code/test/fix/Martians/yelling have to be part of the game? Can't a static typing system save me from this? Can't we stop the Muppets from chanting in my head?

The answer in Java and C# is "No! Well, sorta! Um, no." It is possible to implement a pattern whereby you should never get NoPEs. It's pretty painful to do and the "should" qualifier means it's not really worth it, but it does help illuminate a solution that's available in Scala.2

Some Craptacular Code

First, what's the big deal? Can't I just check for null and deal with it? Here's Java to demonstrate. Any resemblance to a C# code is purely coincidental.


public interface Spaceship {
  public Martian getMartian();
}

// the main method...
Spaceship spaceship = getSpaceshipSomehow();
Martian m = spaceship.getMartian()
if (m != null) {
  m.dieDieDie();
  System.out.println(m.toString() + 
    " is toast.");
} else
  System.out.println(
    "No Martians here!");
}  

The idea here is that some spaceships have Martians while others don't. If there is a Martian, that little bastard will never NoPE me again. Checking for null certainly works. But there's a problem - how do I know for certain that getMartian can return a null? If I didn't write Spaceship, what made me decide to do the null check? And how do I know that getSpaceshipSomehow() won't ever give me a null if I didn't wite it? The answer is I don't, that's how. It might be in the documentation or might not. The docs might be right or they might not. I might have access to the source code or I might not. I might remember what I wrote last week or I might not. The only answer would seem to be to wait for the NoPE to show up in testing or check for null defensively everywhere. Hmmm...stupid Martians.

Okay, let's try creating a holder that I'll call "Option" for reasons we'll see later. Once again, here's some Java (C#ers please consult your magic decoder rings).


public class Option<T> {
  private T value;

  public Option<T>(T value) {
    this.value = value;
  }
}

public interface Spaceship {
  public Option<Martian> getMartian();
}

// the main method...
Spaceship spaceship = getSpaceshipSomehow();
Option<Martian> o = spaceship.getMartian();
if (!o.value() == null) {
  Martian m = o.value();
  m.dieDieDie();
  System.out.println("I'm a 733T Martian h4x0r."
    + m.toString() + " has been pwned.");
} else
  System.out.println("W00T");
}

The idea is that API builders indicate nullibility by saying methods return an Option. getSpaceshipSomehow promises that it will return a Spaceship while getMartian may or may not return a Martian. It's okay as far as it goes, but it's entirely too easy to call o.value() without checking for null first. I'd like the type checker to help me here - maybe with subclasses. Here's the attempt in Java. The equivalent C# should be...um...equivalent.

public abstract class Option<T> {}
public class None<T> extends Option<T>{}
public class Some<T> extends Option<T>{
   private T value;

   public Some<T>(T value) {
     this.value = value;
   }
   public T value() {
     return value;
   }
}

public interface Spaceship {
  public Option<Martian> getMartian();
}

Spaceship spaceship = ...
Option<Martian> o = spaceship.getMartian();
if (o instanceof Some<Martian>) {
  Martian m = ((Some<Martian>)o).value();
  m.dieDieDie();
  System.out.println(m.toString() + 
    " has been laterally motivated.");
} else
  System.out.println("Thinking outside the " +
    "box leverages synergies.");
}

I've got an abstract base class, Option, that's really just a marker that a value might be "null" or not. None means "null", and Some means it's a real value. Code depending on this structure uses instanceof checks to determine which way to cast.

Unfortunately, this code has three holes and one major (huge) annoyance. The first hole is that the spaceship designer has to ensure getMartian always returns an Option and never a null. If that gets blown, we get NoPEd. The second hole is that there's no way in Java to prevent somebody else from creating a subclass of Option - and that would break the logic. The third hole is that I could just cast an Option to a Some without doing an instanceof check, turning a NoPE into an equally irritating ClassCastException . Finally, the major (huge) annoyance is that I have to write all that crappy code every time I deal with this construct. Frankly, I wouldn't blame you if you'd rather deal with NoPE.

Making it Optional in Scala

In Scala, equivalents to Option/Some/None already exist3 - but they're far, far easier to use and they're "sealed" so no logic breaking subclasses are allowed. Here's the last Java example translated into Scala

trait Spaceship {
  def getMartian: Option[Martian]
}

val spaceship = getSpaceshipSomehow
spaceship getMartian match {
  case Some(m) =gt; {
    m dieDieDie;
    println(m + " was terminated.")
  }
  case _ =gt; println("I'll be back.")
}

The Spaceship trait requires all spaceships to implement a getMartian method which may return Some[Martian] or Nothing. Once we have a spaceship we do a match to see if we got a Martian or not, and if we did then it's...retired.

The first thing you might notice is less annotation of types. Let me assure you that the code is just as statically typed as the Java version, it's just that the Scala compiler can figure out the types of "spaceship" and "m" just fine, thank you very much. You may also notice a lot fewer dots, parens, semicolons etc. Scala's syntax allows you to type all that line noise but generally only requires it when things get ambiguous. Finally, in Scala, square brackets work more or less like angle brackets in Java/C# for denoting "generics."

More importantly notice in the "Some" case how cleanly the Martian, m, is "pulled out". The code is easily as clear as the Java version that checked for null. But unlike the null checking Java code it's clear in this case that I have to do the check. If the check wasn't required the code wouldn't even compile. Unlike the Java versions of Option, the Scala version ensures that I've done my checking right.

Confessions of A Leaky Abstraction

Now a bit of a confession. You see, one of Scala's design goals is to interoperate with Java and .Net seamlessly. That's a great source of strength since the libraries for those platforms are huge, the underlying VMs are heavily optimized, and targeting those environments may be a way for Scala to sneak in through the back door of corporate America.

It's also a source of holes. In order to be truly seamless Scala has to have both null and casting. As a result, Scala's Option ends up with some of the same holes as we found in the Java subclassing code. If Scala had tried to plug the null and casting holes, the result would have been an impedance mismatch between Scala and the target VM environments.

Still, the Scala libraries use Option whenever a "null" could be used and Scala projects are following that pattern. As long as you never ever create a cast or a null, you'll only have to worry about NoPE when dealing with existing Java or .Net code. Which is to say, a lot <sigh>. So even though it's not the static guarantee that I want, Scala's Option is incredibly useful since it's just as easy to use as null and far more expressive 5

A Closing Note With Some Rabble Rousing

In Scala, Option is a class and that means it can have methods. One of those methods is a nifty shorthand for getting Some value is available or getting a default value if None. Imagine my Muppet bloodlust (stuffinglust) is so high that if the spaceship doesn't have a Martian, I'll create a new one just to kill it.


val m = spaceship.getMartian.getOrElse(new Martian)
m.dieDieDie

Due to another feature of Scala, optional laziness, a Martian is only constructed when needed. Think of it the way an expression might or might not be evaluated in Java/C# when you use logical or. For example, in Java/C# "m = a() || b()" the b() method won't be called if a() is true. Laziness just generalizes this concept.

Option can also be chained with other Option returning expressions using orElse. If I have 3 ships, s1, s2, s3 and I want to get the first available Martian (if any) then

val o=s1.getMartian.orElse(s2.getMartian.orElse(s3.getMartian))

Again, laziness ensures that we only keep looking for Martians until we find one.

Finally, in Scala, Option is a monad. I won't explain all about monads in this article, but I do want to show that Option brings far more power than what I've shown so far. Imagine that from a spaceship I want to get a Martian, get the Martian's ray gun, see who the ray gun's target is, and tell the target to duck. In other words, in Java/C# I want spacehip.getMartian().getRayGun().getTarget().duck(). A nice, easy, one liner. Cool, but now imagine the Martian, the ray gun, and the target are each optional and my rule is that if any of them are null then nobody needs to duck and for God's sake no NoPEs. Urk! In Java/C# you could chain together a bunch of logical ands


if (spacehips.getMartian() != null
  && spacehip.getMartian().getRayGun() != null
  && ...

Or you could pile up a series of ifs

RayGun r = null;
Target t = null;
Martian m = spacehip.getMartian();
if (m != null) {
  r = m.getRayGun();
}
if (r != null) {
  ...

Either way, it ain't pretty. Because Option is a Monad in Scala the code looks like this

for (
  m <- spaceship getMartian;
  r <- m getRayGun;
  t <- r getTarget
) t duck

It might not be clear just yet how the "for" keyword is doing this magic, but you have to admit the code is eminently readable. Far more readable than the Java/C# alternatives.

Hopefully this has convinced you that null really should be considered harmful. Just as it's become accepted that there are better alternatives to "goto" there really are better alternatives to "null."

So go spread the word! End the NoPE madness! Monads are the keys to defeating our Martian overlords!

Yip, yip, yip, yip.

Footnotes

1. My apologies to Sesame Street. Realy, sorry, Muppets.

2. There are several languages with the same solution including most notably Haskell. But don't be scared - I won't use the word "monad." Oops, I just did. Also I used it in the title and in the closing section. Oh, and in the next footnote.

3. In Haskell, the Option monad is called Maybe. 4

4. Monad. Just wanted to say "monad" again.

5. Haskell and OCaml don't have these holes since they have neither null nor casting.

26 comments:

Anonymous said...

Unnecessarily complex IMO. Nice nullable types are better.

Stephan said...

Ever since I've read about and used the nice language, I wish that Java prevents references to null by default and only allows them with ?Option types.

http://nice.sourceforge.net/manual.html

"Nice's advanced type system makes the distinction between normal (e.g. String) and option (e.g. ?String) types, which allows to prevent NullPointerExceptions"

Peace
-stephan

--
Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
http://www.reposita.org
Blog at http://stephan.reposita.org - No signal. No noise.

Anonymous said...

I fully agree with previous comments: Nice option types are much better.

This is how you make a null pointer exception in Scala:

def getDate() : Date = null

def main(args: Array[String]) {
val date = getDate()
println(date.getTime())
}

If Scala is trying to be a better language than Java, it shouldn't have NPE's at all.

Anonymous said...

I agree that Nice's ?Type is safer for avoiding NPEs and that is a good thing.

But following the link reading about Null handling I find:

int len = (name == null) ? 0 : name.length()

Which is exactly the type of code this article was improving upon!

It looks like Nice does a good job isolating possibly null values, but for actually dealing with them its no better than Java. And, unfortunately, Java libraries (IMO one of the main reasons to uses Scala or Nice) are liberally litered with them.

James Iry said...

Nice is an interesting language, too. One of these days I may blog about multimethods. I think whether language based nullable types are actually "better" than option types is debatable though. On the plus side, nullable types are succinct. They cover the requirements to document the nullability contract and let the compiler ensure I'm not doing something dumb.

On the downside they don't compose any better than Java style nulls. If you implement the spaceship.getMartian().getTarget().duck() code in Nice you end up with exactly the same null checks that you'd get in Java. Scala's for and Haskell's do notation clean that up nicely.

Further, because Option is just a regular case class in Scala (or algebraic data type in Haskell) you can create your own forms that are 3 valued or whatever (e.g. Some, Nothing, Failure).

Finally, because they are monads, the code to use them is very similar to the code you might use for Lists - which makes sense if you think of an Option as a form of List with exactly 0 or 1 elements.

Still, as I admitted in the article and one anonymous poster mentioned, Scala does have null with all of its inherent problems. Nice can have similar runtime errors, but may have a better way of forcing the developer to be aware of them by using constructs like notNull(spaceship.getMartian()) or the "native" syntax. In the end, there's a tradeoff - easy integration with Java vs. safety. Neither Nice nor Scala push that slider all the way towards safety, although Nice has pushed it a bit further that direction with a slight loss in integration ease.

Stephan said...

"On the downside they don't compose any better than Java style nulls. If you implement the spaceship.getMartian().getTarget().duck()"

Yes, for that I'd like to have the safe Groovy operator, spaceship?.martian?.target?.duck

Peace
-stephan

David R. MacIver said...

Although this is sortof missing the point, some comments on the Java implementation. :-)

You say "The second hole is that there's no way in Java to prevent somebody else from creating a subclass of Option - and that would break the logic."

Conveniently, there is! See here for details.

Additionally, when you want pattern matching like things in Java, a better way is to use exceptions for control flow.

So Option becomes:

public final class Option<T>{
T value;

T get(){
if (value == null) throw new None();
else return value;
}

public static class Nothing extends ControlFlowException { }
}

You can now chain your nested options as:

try {
myOption.get().doStuff().get().doOtherStuff();
}
catch (Nothing n){
// do empty case.
}

David R. MacIver said...

Oh. I forgot to mention: Why this isn't slow.

Also, this isn't just recreating the NPE because this exception is checked, so its use can be constraint by the compiler.

hware said...

Good post and interesting discussion.

While using the for-comprehension to go through options seems scala-ish, the Goovy way is more concise.

Heres a stab a more concise Scala:

def option[T](f: => T)=
try { Some(f) }
catch {case npe:java.lang.NullPointerException => None; case e :Error => throw e}

Then we can say:
option(spaceship getMartian getTarget duck)

which returns a Some(duckReturn) or None if there was a NPE

James Iry said...

So many different threads, I'm going to reply with several different comments. Stephan, Groovy's null safe dot operator would be a good fit for Nice, which could even tell you whether it was necessary or not.

However, it's not a fully general solution. Imagine I turned everything inside out - all the getter methods get turned into functions (or methods on another object if you will). So now I want duck(getTarget(getRayGun(getMartian(spaceship))), but again each thing might return null.

Using for notation and option types, it looks pretty similar to the method based version
for {
m <- getMartian(spaceship);
r <- getRayGun(m);
t <- getTarget(r)
} duck(t)

The Groovy style .? operator doesn't handle this situation, unfortunately. Of course, some language could invent an operator that takes care of even that. The point I'm trying to make (although unsuccessfully) is that pattern matching and monads are a neat solution to a much wider range of problems than anything that has a special case for null. Special casing null may create a few neat syntatic tricks, but those tricks will probably not compose as well until they have roughly equivalent power as monads - at which point, why not give the programmer the ability to create their own monads?

James Iry said...

David, thanks for the sealed Java class trick. As for Exceptions to do pattern matching, it's kinda cool but, unfortunately doesn't cover all bases. The reason is that, in Java, anything that extends Throwable cannot be "generic". So, if you wanted Some to be another exception then "public final class Some<A> extends ControlFlowException {..." won't even compile. Obviously with your example, Some doesn't need to be an exception since we can use it as the "normal" case, but in more general pattern matching situations generic types are incredibly useful. It's still a useful technique that every Java coder should keep under their belt.

James Iry said...

hware, your code neatly shows how laziness really helps in creating your own DSL (domain specific language). Unfortunately, your solution, while being nearly as concise as the Groovy version, has all the downsides of the Groovy version. Specifically, we've lost the compiler checked contract that tells me I need to check for nulls in the first place.

Anyway, for those not yet familiar with Scala, hware's code defines a function (option) that takes an expression (f) but does not evaluate f immediately when option(...) is called - f is a lazy parameter which is what the => notation means. Instead, f is only evaluated as part of trying to create and return a Some object inside the try block. If that causes an NPE, then option returns a None instead. All other exceptions just get rethrown. "T" is a type parameter - in Java you would say option is a "generic method."

David R. MacIver said...

Yes. The generics limitation on exceptions is indeed an annoying limitation when trying to use exceptions for pattern matching. There's a workaround, but I freely admit that it's a pain in the ass. :-)

(Incidentally, I like Scala. I very much like pattern matching. I'm trying to make this as nice in possible in Java not because I think your suggested way is bad but because... well, mainly because it amuses me really. :-) )

Anyway, here's the trick for posterity: http://eugeneciurana.com/pastebin/pastebin.php?show=4189

Pretty grim isn't it?

(Yes, I worked around the type system with some unsafe casts. It's not unsafe, and I didn't have to - I could have used something visitor patterny - but by the time I'd thrashed out the details I was already close enough to losing the will to live. :-) )

hware said...

James,

I agree competely, its better not to use nulls and Scala options are a good typesafe replacement.

My snippet might be useful when interacting with code not known to be null free (ie Java). Oh, I fixed a bug:

def option[T](block: => T)=try { val x=block; if (x==null) None else Some(x) }catch {case npe:java.lang.NullPointerException => None; case e :Error => throw e}

Cheers,
hware

Ricky Clarkson said...

I now happily use null, in conjunction with IntelliJ IDEA's @NotNull and @Nullable.

It's not perfect, but it works well enough that I've largely dropped my emulation of Haskell's Maybe. Not that null is better than Maybe, but that CPS is particularly ugly in Java.

misterbeebee said...

Anonymous:

The getDate NPE you demonstrated arises from Scala's "bug-compatibility" with Java.

java.util.Date provides a nullable date class. Misusing the null Date value gives an NPE. A pure-Scala class would not exhibit the same weakness.


What other option is there, if Scala is to faithfully import nullable classes?

I could imagine Scala making Java classes half-nullable, and enabling a pragma/declaration that would prevent Scala-side code from assigning null to a Java class. That seems to be a valuable feature.

Potentially-null values that come over from the Java side can't be helped, though.

Raoul Duke said...

"stuffinglust" - just, like, wow.

andrewf said...

Was planning to blog about this very thing at some point, but...

It seems to me that the impedance mismatch between Java and Scala should be pretty easy to bridge...

Firstly, seems to me that C#/Nice notation for optional values would make things a lot more concise, so that 'int?' should be a synonym for 'Option[int]', in the same way that in C#, 'int?' <=> 'Nullable<int>'. Rationale: would make using Option types a lot more convenient.

Then simply declare all the Java APIs as taking or returning 'Thing?' rather than 'Thing'. Easy!

Oh, okay would need also:
* An automatic conversion on assignment from 'Thing' to 'Option[Thing]'. Surely not unreasonable, and
* Implementation of Java null as 'None[Thing]'.

Voila: complete removal of 'null' from Scala. (Might still get NullReferenceException back from an API, but that's Java's fault.)

Of course, some of the common Java APIs could be special-cased in Scala, because some of them should not be passed nulls in the first place, so if we could annotate the 'File' constructor to say that it doesn't want a null filename, Scala programs would see it as "File(String)" rather than "File(String?)".

andrewf said...

Okay, my last comment is possibly a little blasé. My approach would have some problems:

AnyRef a1 = None[String];
AnyRef a2 = None[Int];
a1 == a2 == Java null;

You'd lose the (runtime) type when casting up to object.

On the other hand the behaviour of None[String] and None[Int] is identical: 'get' does the same thing; 'elements' returns the same thing. So arguably you're not breaching the type system.

Any a1 = Some("bob");
Any a2 = "bob";
a1 == a2

Possibly more problematic. That would allow me to write:

Option o = a1.asInstanceOf[Option[String]]();
String s = a1.asInstanceOf[String]();

That is, at runtime one couldn't distinguish between a naked value or Some(value). I suspect that this would not be a problem in practice. Does compromising the type system like this compensate for removing the NullRefException type errors when calling Java/C#?

One further one:

Any a1 = Some(Some(42));
Any a2 = Some(42);
a1 == a2 (potentially)

Do people compose Option[] like this? (And if so, isn't that abusing the language a bit?) However, a Java API is never going to have a parameter declared as Option[Option[Thing]], so perhaps this could be represented as an Option[Thing] when cast up to AnyRef....? That is the runtime type of an Option[] would have one level of Option[] stripped off.

It only becomes a problem when you cast up to AnyRef (or java.lang.Object); All other Java-declared variables, as long as their types are a strict subclass of Object, could be treated unambiguously as Option[T] (because they allow the value 'null').

Anybody with deep Scala knowledge care to comment?

Andreas said...

The Option wrapper seems like an awkward workaround to me, and doesn't it introduce a performance penalty as well?

I have thought for quite some time that the right solution to the NPE problem would be a type qualifier, say "notnull". That way, "null" and with it Java compatibility could be maintained, and "null-safety" could be obtained at the compiler level.

The following would not compile:

notnull String ns = null; // incompatible type!

Also:

String s = null; // ok
notnull String ns = s // incompatible type
notnull String ns = (notnull)s // ok, but possible NPE
String s = ns // ok
s.length() // incompatible type
ns.length() // ok

Because of the latter, you are forced to explicitly check for null on each method call, unless the object is declared notnull. This is a wonderful incentive to use notnull a lot, thus avoiding explicit null checks and having the compiler keep track of nulls.

Similarly, notnull can be used in method arguments and return types to specify whether nulls are allowed or not, and in most cases they should not be.

Everything is in the compiler, the only runtime adaption is the typecast which may throw an NPE. Pattern matching can also be used:

ns match {
case null => 0
case s: String => s.length()
}

or alternatively:

if (s==null) 0 else ((notnull)s).length()

It seems so simple. Even Java could easily accomodate this, to the great benefit of all programmers who have wasted precious hours tracing nulls through their (or other's) code.

Am I missing something?

Andreas

Andreas said...

Actually, a better type modifier would be "some" instead of "notnull". It sounds much better.

James Iry said...

The Option wrapper seems like an awkward workaround to me, and doesn't it introduce a performance penalty as well?

It's not awkward. It's amazingly clean. The trade-offs in usability between option types and nullable types was already discussed in other comments here.

As for performance: in Haskell and ML, the performance hit should be effectively none since the compiler can use null checks underneath the hood to implement it. In Scala, unfortunately, that's just not going to work and there will be the performance hit of creating one extra object when you have Some(x). If that overhead is significant in your target domain then Scala may not be the right language or you may need to give up type safety in localized performance critical sections.

I'd be happy to also have a nullable type system in Scala if not-null was the default and you could still work with Java seamlessly.

Anonymous said...

Hate null values or not, they are the most important and unique values. By comparison, what is the add operation without the 0 (zero element) or the multiply without 1?? Be serious, if u don't like the null values, it only show that u tend to be lazy in performing the necessary checks in your code. I'd say it is safe to assume when coding that everything is possible and u are NOT allowed to assume anything. Period.

Tom said...

^ you really missed the point.

plankthom said...

When you can ignore the null case, my way around NPE's in java: wrap it in an iterable:


public static Iterable nn(T val) {
return (val != null)
? Collections.singleton(val)
: Collections.emptySet();
}

for ( Martian m : nn(spaceShip.getMartian) ) {
m.dieDieDie();
}

Laurence said...

I think you're confusing "readable" with "compact". You show highly compact code (the Scala "for" example) which to non-Scala people is complete gibberish. I don't class gibberish as readable. Secondly, @NotNull is exceedingly simple, and goes a long way towards addressing your root cause concern (which I agree is well-founded).