Friday, April 24, 2009

Java Has Type Inference and Refinement Types (But With Strange Restrictions)

When enthusiasts talk about Scala they frequently start with "it's a lot like Java but with all this other cool stuff like type inference and structural types and..." I guess it's okay to introduce Scala that way. I've done it myself. But it does give the impression that Scala has somehow heaped complexity onto the Java. Oddly enough, it hasn't, at least not in the areas of type inference and structural types. Java has both. Sorta. They're just not very useful.

In this short article I'm going to try to show that while type inference and structural types appear to be additions to Scala over and above Java, they can also be explained as the removal of peculiar restrictions from Java. The goal is not just to illustrate these two points, but to make a broad statement: Scala isn't a just bunch of features heaped on top of Java. To a significant degree it's a generalization of things already found in Java. The result is that many aspects of Scala are both simpler and more regular than Java.

Type Inference

Java folk get excited about Scala's type inference. Usually positively, but sometimes negatively. But here's the thing: every practical statically typed language has some type inference. Every one. Even Java has type inference. It's just very limited. Let me show you by starting with a simple class

class Foo {
  public Foo bar() { return this; }
  public String toString() { return "Foo!"; }
}

With that I can write

System.out.println(new Foo().bar().bar().bar());

It's silly code but it illustrates my point. Java doesn't need me to explicitly annotate the return type of each call to bar(). Java infers the type resulting from new Foo().bar() as being a Foo, which in turn can be sent a bar() message resulting in a Foo, etc. If I change things just a little the type system complains

System.out.println(new Foo().bar().bar().baz()); // bzzt error!

In Java I "only" have to annotate types when I want to assign to a variable or return from a method. Which, come to think of it, is an odd restriction. The compiler obviously knows the type, so why do I have to tell it yet again?

Foo f = new Foo().bar().bar().bar();

Thus, rather than saying Scala adds type inference we can equally well say that it removes a silly restriction from Java's type inference[1].

Structural Refinement Types

The fact that you don't have to annotate types when building expressions in Java is well known even if people don't usually call it type inference. What's not as well known is that Java has refinement types. So I'll have to explain, this time starting with Scala.

In Scala you can say method doSomething takes any object with a bar method, like so

def doSomething(x : {def bar : Unit}) = ...

That's called structural typing - the doSomething method puts a constraint on the structure of objects that may be passed to it. But actually, in Scala, that's just shorthand for the following refinement type

def doSomething(x : AnyRef {def bar : Unit}) = ...

That says that x must be a subtype of AnyRef and must have a bar method. The type of x is a "refinement" of AnyRef with an additional structural constraint. Scala's refinement types aren't limited to just refining AnyRef, but that's really not important to my point here. My point is that Java has refinement types, too.

If you're a Java programmer, you're probably scratching your head thinking I've lost my sanity. But I promise I haven't. I ask you, in Java what exactly is the type of this expression?

new Object() { 
  public void bar() {
     System.out.println("bar!"); 
  } 
}

If you said "Object!" think again. In Java it's actually something more sophisticated. Go paste this into your favorite Java IDE and run it. I'll wait

public class Test {
  public static void main(String[] args) {
    (new Object() { 
      public void bar() { 
        System.out.println("bar!"); 
     } 
    }).bar();
  }
}

Surprised? Remember my point about limited type inference above? Here, Java has clearly inferred that the result of the expression is an Object refined with a bar method[2]. And so it lets you send a bar() message to the object. Rename either the method or the call to "baz" and you get a type error. But if you want to do anything interesting with the (new Object {public void bar() {...}};) object, like assign it to a variable, return it from a method, or apply it as an argument to a method then Java has a problem. It has no way to denote the refinement. So the refinement gets "lost" from the static type system. Poof. Gone. It's part of the object, you can find it via reflection, but the static type system won't let you get at it. The static type system only lets you get at a refinement "immediately" in a surrounding expression. Which is pretty close to useless. When you look at it that way, Scala hasn't so much added refinement types as removed another silly Java restriction.

Conclusion

When Java people first start looking into Scala it's common to think of Scala as being some complicated mishmash. In large part, evangelists (including me) are to blame. It's surprisingly hard to convey a nice feature of language A to users of language B without making it sound like something totally new, hence something that will complicate their lives. Hopefully this article has helped illustrate that in a very real sense some of the things that appear to be "new" in Scala are really "old" but generalized to be made more useful.

So this article is a call to all Scala pundits out there: when comparing Scala to another language, especially Java, go ahead and get excited about a "new" feature...but then see if you can't explain it as a generalization of an "old" one. You might be surprised how often you can and your audience might come away less intimidated.

Foot Notes

  1. Technically, Scala's type inference is quite a bit more sophisticated than Java's. But I won't let inconvenient facts like that get in the way of making a good point.
  2. Technically, I think the Java spec says that the type isn't structural, but some anonymous nominative type that the compiler can infer and use over a very limited scope. Then again, maybe not. I'm too lazy to go re-read the spec. Also, as already established, I won't let inconvenient facts like that get in the way of making a good point.

17 comments:

Daniel Spiewak said...

You didn't even mention generics! Yes, Java does infer (or more correctly, *compute*) expression types as you said, but this is (as you said) a core part of any static type checker. However, that's not all Java can do in the area of type inference. By using generics in a very hairy way, it's actually possible to get a very crude form of type inference in vanilla Java. This comes out of the fact that Java has to perform inference as a matter of course in order to type generic methods. Unfortunately, this technique isn't really useful in the general sense. It's also very difficult to illustrate since any examples of it are hundreds of lines long. :-)

Reinier Zwitserloot said...

Java always infers the type of expressions, but never infers the type of variables, return types, or anything else. Seems perfectly consistent to me.


A few more instances of inconsistency between the typing of expressions and the typing of everything else:

Composite Types. Expressions in java can be composite, for example:

someBoolean ? new ArrayList() : new LinkedList()

is a composite: lub(ArrayList, LinkedList) computes to: COMPOSITE[Serializable, Cloneable, AbstractList].

That's the actual type of that expression. You can call any method that's in any of those classes/interfaces on the expression, and you can assign the expression to any variable of any of those types. You can sort of, kind of, write them as well, in generics bounds:

List<Formatable & InputStream> foo;

is perfectly legal, though somewhat obscure, java code.

That's a lot like your example of the magic anonymous inner class instance that has its structural typing intact, so long as you operate directly on the expression and don't assign it to anything.


The question, however, is: Is it 'smart', so to speak, to complicate the language with ways to write these types (structural types OR composite types)? It involves adding a bunch of extra symbols and more complexity to the parser.


Of course, given that you CAN make composite types in generics bounds, java certainly can't claim simplicity and purity of syntax anymore.

James Iry said...

Reinier,

Good point on composite types.

Your point on syntax for types might be valid in some language, but isn't in Scala. A refinement type looks like

Type1 {def refinement : Type2}

Which exactly mirrors how you construct instances of anonymous inner classes in Scala "new Type1 {def refinement : Type2}". In turn, that's not too different from anonymous inner classes in Java (modulo the usual syntactic differences).

And composites in Scala are expressed as

x : Type1 with Type2 with Type3

Which exactly mirrors how you create a new class that extends several super classes

class Foo extends Type1 with Type2 with Type3

Java's rules for composite types are a strange mishmash. Scala's rules are, again, pretty general and regular.

Jesper Nordenberg said...

Saying that Java has refinement types is a stretch (unless you call normal inheritance a refinement). In your example a new class is created which extends Object and adds a bar method. The type of the expression is simply this class, so the compiler doesn't have to have a concept of "Object + bar method".

In Scala this is different, the expression really has the type "Object + bar method". The problem with this is that it leads to use of reflection when calling bar.

Michael Campbell said...

In:

Foo f = new Foo();


I'm pretty sure this restriction was *NOT* that the compiler couldn't otherwise figure out the type, but to be "explicitly explicit"; to MAKE the writer specify the type in order to minimize "mistakes". 'Say what you mean, mean what you say', and all that.

Remember Java was written as a language for average programmers. A lot of restrictions are there on purpose, not because they couldn't figure a way around it.

That said, I find it ridiculous.

Bill Venners said...

Hi James,

I wouldn't call keeping track of the type of an expression type "inference." But mainly, I see a difference between keeping track of an expression type and inferring the type of a variable, because a variable can usually have any of several types. In Java I would usually do:

List<String> myList = new ArrayList<String>();

I did that in part because List is shorter than ArrayList, but primarily because I really wanted the type of myList to be List<String>. I didn't want to rely on anything that may be added in ArrayList that's not defined in List, and by making that explicit in the code I communicated that intent to readers of the code.

When I started doing this in Scala:

val myList = new ArrayList[String]()

I felt a bit uncomfortable, because I knew the Scala compiler would infer the type of myList to be ArrayList<String>, not List<String>. Scala does let me do what I did before in a more concise way, in that I don't have to repeat the type parameter if I say:

val myList: List[String] = new ArrayList()

But I quickly let go of that discomfort and just always do the former. In practice I see it as more advantageous to reduce the verbosity of my code than to make my variables as abstract as possible, at least for local variables in methods.

But therein lies the difference. Even though myList is initialized with a java.util.ArrayList[String], its type could be exactly that, or List[String], or Serializable, or Cloneable, or Iterable[String], or Collection[String], or RandomAccess, or AbstractList[String], or AbstractCollection[String], or Object or AnyRef, or Any. Whew. So to me "inference" means the Scala compiler is actually picking the most specific of those, which in practice tends to be a perfectly fine choice about 98% of the time. The rest of the time I can specify some other type explicitly. By contrast the type of an expression should be exactly that specific type.

James Iry said...

@Bill,

I know most people think it's fundamentally different. But it's not. Unfortunately, Java has no syntactic distinction between up casts and down casts, so I'll go with Scala to illustrate. You argue that the difference is that expressions can only have one type while variables can have many. But that's not true. Expressions can be explicitly annotated with any compatible type in the same way.

There's no fundamental difference between

val x : X = something

and

val x = (something : X)

In other words, you can force an expression to have a compatible type just as you can with variables. In Scala it would be completely legitimate for me to write

(((new Foo : Foo).bar : Foo).bar : Foo)

But I don't need to - Scala infers the type of each subexpression for me. And so does Java.

The classic simply typed lambda calculus does not.

An even better example is with "generics." As Daniel mentioned in his comment, Java can sometimes infer type parameters.

public [T] List[T] list(T... x) {
//...
}

println(list("hello". "goodbye"));

Here, Java infers that we want T to be String, so we don't need to annotate it when calling single.

In any statically typed language, at some point, the type checker will have assigned a type to every little sub expression, every type parameter, etc. Different languages certainly differ in the sophistication of the engine that infers these "missing" types, and some type systems are constructed in such a way that all types can be inferred. But logical inference is logical inference, even if a particular logic's rules are trivially simple.

All that doesn't matter. The point I was trying to make, and apparently failing, is that Java people have no need to be afraid of Scala's type inference because it's a logical extension to stuff they already know.

fredrik said...

A nice little trick in Java is to create factory methods for maps, lists etc. that do type inferencing on the right hand side of the expression, e.g.:

Map<String, Map<Int, Long>> m = MapUtil.makeMap();

...

public static <K,V> Map<K,V> makeMap() {
return new HashMap<K,V>();
}

Can save quite a bit of typing if you have nested data structures :)

Andreas Rossberg said...

Sorry for the late comment, but I just discovered your post and thought I'd clarify a bit.

The difference between type inference and mere type checking as a degenerate case is the following: as long as types can be derived in a (roughly) bottom-up manner, without the necessity to guess, it is simply checking. If that is not sufficient then you enter the realm of inference. Typically, the latter situation is due to the presence of declarative typing rules that require non-deterministically guessing a type.

And indeed, you are right, there is no fundamental difference between

val x : X = something

and

val x = (something : X)

A type system guy wouldn't think of either as requiring type "inference". (Despite what the C++ and C# crowd likes to call it.)

Type inference occurs when you have to look at the use sites of a variable to infer its type. The standard example are function arguments. If your language allows to say

def f(x) = exp

then you cannot determine the type of x from its binder, you have to look at its uses in exp - and possibly even at the uses of f in the rest of the program! Inferring this information can be arbitrarily hard, and can easily make type inference non-compositional (and thus intractable) if you are not careful.

Anonymous said...

Type inference is unacceptable except if it generics inference, and even then only on the right side of the expression.

Structural types are ace, but only at api borders.

Brian Pontarelli said...

The issue with all of this isn't Java or Scala, but the JVM. The JVM doesn't infer types or methods. I haven't looked at Scala, but I'm guessing it suffers from a dynamic method lookup and incurs a really large amount of overhead try to figure out how to pass objects around.

This could be pre-compiled with anonymous interfaces or classes, but that's still overhead on the perm space. Plus, you still need to determine at some point if a call "satisfies" the method's contract and then allow the method signature to change while still allowing the caller to work without re-compilation IFF the caller still "satisfies" the method's new contract. This is why Java's type system is very strict in most of these and other cases.

In the end, we should stop talking about languages and start talking about VMs and bytecodes. I should also mention that the CLR is moving way ahead of the JVM in these areas and has been for quite some time. I'm just waiting for Sun/Oracle to figure this out and fix it.

James Iry said...

Brian, I'm not clear on what you're talking about. Type inference is entirely a static concept that can be taken care of by the compiler and needs no support from the runtime. By the time Scala code is executed by the JVM, the bytecode looks pretty much like what you'd expect from Java.

Now, if you're talking about refinement types then you'd be right. Scala currently uses reflection to invoke methods via a refinement type. That is primarily a limitation of the JVM. But it's not essential. It should be possible to do refinement types on the JVM by boxing and unboxing compiler created adapters. That would make calls to refinement types about as efficient as regular calls.

Also, your question about the recipient still satisfying the call doesn't make sense in Scala. Scala doesn't support dynamic meta-programming any more than Java does.

ben said...

compiler 'knows' the type but might need to do a bit of work to find it out in some situations.

ie:

var x;

if (test()) {
x = foo();
} else {
x = bar();
}

x.z();

what is the type of x? removing type inference can save you a bit work in the compiler :)

wolfy said...

Note: The <pre>-tag should really be supported in comments, it's quite cumbersome to do indentation with &nbsp;....


"But if you want to do anything interesting with the (new Object {public void bar() {...}};) object, like assign it to a variable, return it from a method, or apply it as an argument to a method then Java has a problem. It has no way to denote the refinement."

Well, no. Sorta. ;o)

This is no "type refinement", it's actually an implicit inheritance, created with a mechanism called an "anonymous local class".

new Object() { public void bar() { ... } }

basically means

new [AnonymousClass] extends Object { public void bar() { ... } }

If you'd use an interface instead of Object, the "extends" becomes an "implements" and you'd have to fulfill the contract by actually implementing all the interface's methods as usual with interfaces.

You do not have access to the anonymous name, so you can use it only once and if you want to assign it to a reference variable, it will be cast to the base type ('Object' in this case).

But you are not telling the whole story, because you can create local classes which are not anonymous:

First we define an interface for use in our example:

public interface SomeInterface
{
    public void foo();
}

Now, local classes can be declared and defined inside any method body:

SomeInterface someMethod()
{
    class LocalObj
    {
        public void bar() { ... }
    }

// as always, the class 'LocalObj' is implicitely derived from Object, but you can inherit from any class you like and/or implement an interface:

    class Local extends SomeBaseClass
      implements SomeInterface
    {
        public void bar() { ... }
        public void foo() { ... }
    }

// Such a local class can be instantiated as often as you like and its type can be used as usual:

    Local l = new Local();
    l.bar();
    l.foo();
    SomeInterface i = new Local();
    i.foo();

    return l;
}

So inside the scope where local classes are defined, they work basically similar to a refined type.
The real drawback here is that the names "LocalObj" and "Local" are not usable *outside* the method body.
But provided the interface is visible outside the method (i.e. it is public), you can at least use its type to pass out the instance of the local class, see the return statement above: the reference in variable l is cast to the type of the interface (which of course means that you can NOT call bar() outside of someMethod()!)

And this even works for anonymous local classes:

SomeInterface someMethodAnon()
{
    return new Interface()
    {
        public void foo() { ... }
        public void bar() { ... } // redunant!
    }; // ";" terminates return-statement!
}

Thus, somewhere else you can write:

SomeInterface i = someMethodAnon();
i.foo();
// but not i.bar()!

We see, these Java constructs are not the same as a refined type, but can do the same or nearly as much, depending on where and how you use them.

wolfy said...

Sorry, I forgot the example to prove the point that one is *not* loosing the (refined) type when passing on the reference in a method call, if a local class with an interface (see my first comment) is used:

public void method(Interface i)
{
        i.foo();
}

...

method(new Interface() { public void foo() { out.println("foo"); } });


While this compound statement is not exactly very pleasing to the eye (if you really *must* use an anonymous class in situ...;o) ) it is no problem to break it down into multiple lines for readability (which I'm not doing here, indenting with nbsp really *is* is cumbersome ;o)

Anonymous said...

public class Test {
public static void main(String[] args) {
int i = times10((new Object() {
public int bar() {
System.out.println("bar!");
return 10;
}
}).bar());


System.out.println("i = " + i);
}

private static final int times10(int n) {
return n * 10;
}
}


This compiles and runs so am I missing your point?

Sarah A180 said...

As others have pointed out, this is not at all type refinement. It's literally defining a class whose name is never given. This is easy to see if you know the Java bytecodes at all.

In the example given, it's equivalent to 

  class Test$1 { … }; ... new Test$1().

Any closure variables are sugared into constructor variables.If you look at the internals of Scala, it's doing a lot of the same stuff. It just takes it a step farther, for example wrapping non-final variables that will be closed over in a value holder so they can be mutable even outside their current scope.

It's pretty enlightening to take jad or JD and look at exactly what Scala does to translate your pretty functional code into Java. Most of it looks like your code with lots and lots of wrapper objects that it adds for various reasons. The methods on those objects are usually pretty simple and readable except where you are using pattern matching, which uses the JVM's goto command in a way that can't be decompiled.