Tuesday, May 25, 2010

Anatomy of an Annoyance

The warning "Type safety : A generic array of T is created for a varargs parameter" is so irritating it makes me want to beat Java with a snoracle, an implement that allows defunct tech companies to breath while underwater. This article will explain what the warning is, where it comes from, and why it is pretty boneheaded. The warning also shows just how hard it is to extend an existing language without creating pain for your users.

My sample use case: anytime you build a Map in Java it takes approximately 6 metric tons of boilerplate.

Map<Integer, String> numbers = 
   new HashMap<Integer, String>(3);
numbers.put(1, "one");
numbers.put(2, "two");
numbers.put(3, "three");
// etc

It would sure be nice if that could be done in one line of code as in most sane languages. So let's fix it! First, we need the ubiquitous Pair class[1].

public static class Pair<A, B> {
   private A first;
   private B second;

   public Pair(A first, B second) {
      this.first = first;
      this.second = second;
   }

   public A getFirst() {
      return first;
   }

   public B getSecond() {
      return second;
   }
  
   /* a real Pair class would also have equals and 
     hashCode, but for this code I don't need them */
}

And then, to make it more pleasant to use, we write a static function that can take advantage of Java's half-assed type inference.

public static <A, B> Pair<A, B> pair(A first, B second) {
   return new Pair<A, B>(first, second);
}

Finally, the pièce de résistance

public static <A, B> Map<A, B> map(
      Pair<A, B>... pairs) {
   Map<A, B> map = 
      new HashMap<A, B>(pairs.length);
   for(Pair<A, B> pair : pairs) {
      map.put(pair.getFirst(), pair.getSecond());
   }
   return map;
}

And so, with faces smiling as the sunoracle shines down upon us, we use it

Map<Integer, String> numbers = 
   map(pair(1, "one"), pair(2, "two"), pair(3, "three"));

Ah, sweet sweet beauty[2]. Except one nasty, tiny, shoot-me-now-and-bury-my-carcass-in-downtown-Santa-Clara annoyance. That single lovely line of code that so innocently tried to use all our wonderful machinery is tagged with the warning "Type safety : A generic array of T is created for a varargs parameter." And the only way to "fix" it is to nuke the whole area from orbit with '@SuppressWarnings("unchecked"),' possibly hiding real problems in the same method. Gah.

History, Part The First

To understand the problem we have to go way back in history, at least as far as Java 1.0. Maybe even to Oak or to the egg and sperm that fused to make James Gosling. Whatever. In designing Java there was a question: since String is a subtype of Object shouldn't String[] be a subtype of Object[]. After all sometimes that's useful: if I have an array of Strings and you want to write a function that iterates through an array of Objects, aren't we good if I send you my Strings?

Well, no, we're not good. We're very bad. If String[] is a subtype of Object[] then Bad Things Happen™


void badThing(Object[] objects) {
   objects[0] = new Integer(42);
}

void test() {
   String[] strings = {"a", "b", "c"};
   badThing(strings);

   // length of an integer?
   System.out.println(strings[0].length());
}

If the above were allowed then we'd be trying to find the length of an Integer and that, um, won't work. Reading is okay but writing is bad. So to solve the problem the original Java designers made a pact with Satan. Instead of statically preventing code like that they added a dynamic check: anytime you store into a non-primitive array Java does an instanceOf-like check to make sure it's legit. In my sample code, the assignment is checked in "objects[0] = new Integer(42)". Failure will give an ArrayStoreException. That means every array store has to do expensive extra work and, more importantly to this article, at time of construction an array must be associated with a dynamic type tag. Like all pacts with Satan the negative consequences of this deal weren't obvious until much later.

History, Part the Second

With Java 1.5, Java programmers were finally able to move to the forefront of 1970's era technology with parameteric polymorphism, er, excuse me, generics. But see, there was this huge collection library and 3rd party libraries that extended it and...how the hell are we going to make the future safe for the past?

The decision was to compile with "type erasure" - dynamic type tags no longer carry all the information in the static type. Now, there's nothing evil about type erasure. Some of my best friends are languages that compile with even more type erasure than Java. It's just that Java has "instanceof" and other similar things that don't play well with it. Relevant to this article we have the aforementioned pact with Satan where arrays need a dynamic type tag or Bad Things Happen™.

Concerned with Bad Things, the Java designers started spackling over some holes. The upshot: this code fails to compile with the static error "Cannot create a generic array of T"

public <T> T[] arrayOfT(T a1, T a2, T a3) {
   return new T[]{a1, a2, a3}; // bzzt static error!
}

Same deal with this, "Cannot create a generic array of Pair<A, B>".

public <A, B> Pair<A, B>[] twoPairs(
      A a1, B b1, A a2, B b2) {
   return new Pair<A, B>[]{pair(a1, b2), pair(a2, b2)};
}

But this is a very shallow spackling indeed. Keep reading.

History, Part the Third

Also with Java 1.5, the designers added varargs: the ability to pass an arbitray number of arguments of the same static type to a method. So convenient! But, uh, apparently the varargs guys and the generics guys didn't talk to each other until late in the game. See, the varargs folks decided to implement varargs as sugar over arrays rather than something that's actually type safe.

public void whatClassIsIt(Object... things) {
   System.out.println(things.getClass());
}

whatClassIsIt(pair(1, "one"), 
   pair(2, "two"), pair(3, "three"));

There's our old buddy erasure. The above code prints "class [LPair;" - array of Pairs. But note it's not dynamically known to be an array of Pair<Integer, String>.

Making Bad Things Happen™

Now we've got enough knowledge to screw things up nicely. We've got statically unsafe arrays that depend on runtime type tags to provide a reasonably early error. And we have an ability to create arrays that don't carry all the information needed to do the runtime check.

public static <A, B> Pair<A, B>[] pairs(
      Pair<A, B>... pairs) {
  return pairs;
}

Pair<Integer, String>[] pairs = 
  pairs(pair(1, "one"), pair(2, "two"), pair(3, "three"));
Object[] objects = pairs;
// no error in this assignment, statically or dynamically!
objects[0] = pair('c', 2.0); 

// arrgg ClassCastException
System.out.println(pairs[0].getSecond().length()); 

The code doesn't have an error at the point of assigment. The error does finally occur later when I attempt to use a value. That means that in real code the error could be very far from the real cause. The only diagnostic that prevents this Bad Thing™ from happening is the warning on the first line; our lovely "Type safety : A generic array of T is created for a varargs parameter".[3] And that warning has to occur in a place that doesn't know whether the use of the array is actually bad or not. Hence, our code's beauty (or at least what passes for beauty in Java) is shot.

Conclusion

Ah, if only. If only Java didn't have covariant arrays. Or, since it does, if only the designers had used something other than arrays to carry varargs. Woulda coulda shoulda. This article shows just how hard it can be to extend an existing language without shooting yourself and your users in the ass. Runtime checked array assignment + varargs in arrays + type erasure == pain.

Now you know why Scala's arrays are invariant.

Footnotes

  1. Q: What makes Java so manly? A: It forces every programmer to grow a Pair.
  2. Please don't rag me about how I wrote 17 lines of code to avoid writing 3 - presumably I'd reuse this infrastructure a lot.
  3. For an even nastier problem with the erasure scheme and arrays see this example of trying to write higher order code using generics and varargs.

15 comments:

  1. Small typo in paragraph "History, Part the second" there is some HTML tag in the first example code :
    "public lt;Tgt; T[] arrayOfT..."

    Interesting post, quite funny also (may be bashing Java is just too easy ;)).

    ReplyDelete
  2. oh god no not a Pair class... don't you know you can't turn Java into Lisp. We are doomed. DOOMED!

    Now just imagine if something so useful as the Pair class were to actually make it into the JDK. I shudder to imagine the possibilities.

    ReplyDelete
  3. I think SupressWarnings should be the slogan of the "Everything that is wrong with backwards compatibility"-Party. I'd vote them

    ReplyDelete
  4. The same problem exists with arrays-of-lambdas, which means that arrays of lambdas have to be prevented in Java for now and ever after:

    http://mail.openjdk.java.net/pipermail/lambda-dev/2010-February/000567.html

    ReplyDelete
  5. As if type erasure were biting Java only... how about this:

    def asSet(iterator: =>Iterator[T], size:=>Int) {...}

    def asSet(iterable: =>Iterable[T], size:=>Int) {...}

    Cannot have both. Erasure make their signatures identical: (Function0, Function0)

    ReplyDelete
  6. Could you maybe switch to a one-column layout? The code is always cut off, because it neither wraps nor generates a horizontal scroll bar.

    ReplyDelete
  7. Every time I think about closures/lambdas/blocks in Java, I can't help but wonder how many "gotcha" articles similar to this one will have to be written about that new Java feature.

    Groovy, Scala, etc aren't perfect either but they get the benefit of not having to deal with all the history, baggage, and backward compatibility concerns of Java.

    ReplyDelete
  8. You needn't have gone through such hoops, though:

    Map numbers = new HashMap(3) {{ put(1, "one"); put(2, "two"); put(3, "three"); }};

    ReplyDelete
  9. Yup, Arrays just don't play nice with generics, and varargs can have their own special corner of hell even without them.

    It is a bit of extra work – but IMHO worth it – to use the Builder pattern instead, like the google-collections Immutable* builders for instance:

    Map map = ImmutableMap.builder().put(1, "one").put(2, "two").put(3, "three").build();

    You don't get type-inference (unless you declare your builder on a separate line), Java is too dumb for that, but there are a few very nice advantages to the pattern.

    ReplyDelete
  10. Java already contains a Pair class. Have you looked at: SimpleEntry

    (http://java.sun.com/javase/6/docs/api/java/util/AbstractMap.SimpleEntry.html)

    ReplyDelete
  11. @Vlad, indeed type erasure doesn't play well with overloading. I don't find that to be such a big deal, though.

    @Daniel, using that pattern is a cute trick but not a general solution because it only works for mutable collections.

    @Jed, the builder pattern is an eyesore when compared to the more straightforward approach but I admit it's about the best available under the circumstances.

    @Wrick, Map.Entry and its subtypes are too special purpose for my taste. If I write public [T] Pair[List[T],List[T]] split(List[T] list) everybody knows what I'm doing. If it's Map.Entry[List[T], List[T]]... then that's just confusing as hell.

    ReplyDelete
  12. > don't you know you can't turn Java into Lisp

    And yet, by Greenspun's Tenth Rule, every Java program is always slowly turning into lisp.

    ReplyDelete
  13. Didn't read it all (I'm -thoroughly- familiar with this annoyance), but it's going away in java 7:

    http://blogs.sun.com/darcy/entry/projectcoin_inducing_contributory_pollution

    ReplyDelete
  14. This compiles in Java 6:

    HashMap test = new HashMap() {{ put(1, "one"); put(2, "two"); put(3, "three"); }};

    ReplyDelete