One Div Zero: The Best-Laid Schemes O' Mice An' Constructors Gang Aft Agley

Last time I talked about how the Scala compiler currently translates try/catch expressions used as arguments to method or constructor call into separate method definitions which in turn may require boxing of mutable variables. That's because an exception in the try would blow away the local stack of stuff needed to make the original method/constructor call. So I proposed a plan for the Scala compiler to translate

expr0.foo(expr1, try/catch)

into

val temp0 = expr0
val temp1 = expr1
val temp2 = try/catch
t0.foo(temp1, temp2)

And to do that recursively as needed.

The Flaw

It was a great plan. A beautiful plan. It had only one flaw. Under some circumstances it would change the behavior of some constructors calls. To understand why we need to look at some JVM bytecode.

At the bytecode level, the constructor call

val foo = new Foo(expr1, expr2,...)

looks like(1)

new Foo
dup
// compute the result of expr1 and leave on stack
...
// compute the result of expr2 and leave on stack
...
invokespecial Foo.init
store foo

Which may need some explanation. The first instruction, "new Foo", creates a new, uninitialized Foo object but does not invoke its constructor. A reference to the uninitialized object is left on the top of the local stack. Then dup duplicates the top of the stack so now there are two references to the uninitialized object. Then a series of instructions computes the results of the arguments to the constructor, leaving each result at the top of the stack in turn (and pushing down the previous stuff). Finally, Foo's constructor is called which consumes all the constructor args and one reference to the Foo object. After the constructor is done, what's left behind on the stack is one reference to the now initialized Foo object. The store instruction consumes that and stuffs it into the foo variable.

If expr2 were a try/catch and my proposed transformation were to kick in, then that would look like

// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
new Foo
dup
load temp1
load temp2
invokespecial Foo.init
store foo

At first glance that would appear to solve the problem nicely: the order of effects between expr1, expr2, and Foo's constructor is all preserved. Right? Sigh.

You see, the bytecode instruction "new Foo" potentially has a visible side effect. That's because it can cause Foo's static initializers to fire if they haven't already. In the original, inline version, the static initializers would have their effect before expr1 and expr2. In the second version the static initializer might happen after. Here's a bit of Java code that demonstrates the difference.

public class Sample {
  static class Foo1 {
    static { 
      System.out.println("Foo1 static init"); 
    }
public Foo1(int arg1, int arg2) {
      System.out.println("Foo1 constructor"); 
    }
  }
  static class Foo2 {
    static { 
      System.out.println("Foo2 static init"); 
    }
public Foo2(int arg1, int arg2) {
      System.out.println("Foo2 constructor"); 
    }
  }
static int expr1() {
   System.out.println("expr1");
   return 1;
  }
static int expr2() {
   System.out.println("expr2");
   return 2;
  }
public static void main(String[] args) {
    System.out.println("static init comes first.");
    Foo1 foo1 = new Foo1(expr1(), expr2());
System.out.println();
    System.out.println("static init comes later.");
    int temp1 = expr1();
    int temp2 = expr2();
    Foo2 foo2 = new Foo2(temp1, temp2);
  }
}

I wrote that in Java because Scala doesn't have static initializers (the equivalent is done via constuctors on declared objects). Which might make you think "then who cares." But, Scala has to inter-operate with Java. So we need a plan.

Plan 1 (which won't work)

It's tempting to try to stuff the uninitialized object into a variable before dealing with the args.

new Foo
store temp0
// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
load temp0
dup
load temp1
load temp2
invokespecial Foo.init
store foo

That looks beautiful but, sadly, it is expressly forbidden by the JVM spec to have a ref to an uninitialized object in a local variable when a catch block is executed. The bytecode verifier will reject that approach outright.(2)

Plan 2 (which is annoying, quite probably slow, and may not work everywhere)

So we need to force class initialization early. One way way to do that is Class.forName. So now we have this

ldc classOf[Foo]
invokevirtual Class.getName
iconst_1 // push `true` onto the stack
invokevirtual Class.getClassLoader
invokestatic Class.forName
pop // throw away the class object that results from Class.forName
// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
new Foo
dup
load temp1
load temp2
invokespecial Foo.init
store foo

That's a lot of code coming out of one measly constructor call! To some extent that can be mitigated by creating a static helper method in the runtime library and calling that instead. But that doesn't entirely mitigate a performance concern. How fast is Class.forName? The answer will take some benchmarking but I'll bet right now it's a whole lot slower than code without it.

Even ignoring pefromance concerns Class.forName has a potentially fatal flaw: security managers might very well prevent it. So library A happens to trigger this transformation and company B can't use the library because of a policy decision about restricting Class.forName.

Plan 2b (which might help a bit with Plan 2's slowness but definitely won't work everywhere)

A solution related to Plan 3 is to use sun.misc.Unsafe.ensureClassInitialized. It's a much simpler interface than Class.forName and is likely closer to the underlying mechanism to do class initialization win the "new" instruction. The problem is that it's not portable to all JVMs and it's even more likely to be hidden away by a security manager.

Plan 3 (which is so crazy it just might work)

A realistic possibility, which is a bit crazy, is to create an uninitialized object then throw it away which would force static initialization but wouldn't call the constructor.

new Foo // ensure initialization
pop // throw that object away, we don't need it
// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
new Foo
dup
load temp1
load temp2
invokespecial Foo.init
store foo

Which looks distinctly WEIRD but it's valid bytecode and gets the job done. The annoying factor with this one is that it can only be expressed in bytecode so either the whole transformation has to be done at a very low level or we would have to add a new node type to the Scala AST which is used solely to express "force static initialization of thus and such class" and then let the code gen phase translate that to a new/pop pair.

There's also a bit of concern that it's TOO weird. It's valid by the spec, but it's nothing javac would ever spit out so it's entirely possible that some JVM implementation somewhere would be very confused by it.

Plan 4 (which is scary, but has some merit)

The final possibility is to ignore the problem. Scalac could just reorder expr1 and expr2 to come before the new and leave it at that. It's not something that most Scala programmers would ever even notice. The very few it does affect can manually do whatever translation they want to get their static initializers to fire at the right time. Perhaps the -Xlint flag could issue a warning about static initializer order when the transformation takes place.

But then again, the reason for the re-ordering will be pretty non-obvious so the few it does affect will likely struggle with it for awhile, then decide Scala is just broken without ever seeing what's behind -Xlint.

What Now?

Hopefully this has been an illuminating romp into some of the innards of compiling for the JVM. The immediate plan is that Scala will continue to turn problematic try/catch expressions into method definitions which will occasionally mean a bit of extra boxing. In the grand scheme of things that perf hit is probably small enough and rare enough to keep this whole experiment at a low-ish priority. Still, when I have some time I'd like to go see what other JVM language with try/catch expressions are doing here. They might have figured something out that I haven't.

Footnotes

I'm using javap as a guide for how to disassemble bytecode. But I'm keeping variable names and class names and ignoring details about variable slots and the constant pool that aren't important for this discussion.
Why is it not valid? The simple answer is "because the spec says so." Specifically it says "There must never be an uninitialized class instance in a local variable in code protected by an exception handler." But the underlying reason for that restriction isn't entirely clear to me. JVM bytecode is statically typed. The JVM's type system is just very peculiar by the standards of mainstream statically typed languages. The type checking is (a part of) the bytecode verification process. In the verifier, every variable and stack slot gets one static type over a static region of the bytecode. In different regions those types are allowed to be different, but when flow through the two regions converges all the types must be consistent. Now, the verifier thinks of an uninitialized Foo as a distinct type from an initialized Foo. Calls to a constructor (init) are the magic that turns an uninitialized and thus unsafe to use Foo in one region into one that is initialized and thus safe to use in another region. Which is all good. Except for some reason the uninitialized to initialized transition has been special cased for exception handlers rather than treated as just another type-state change to a variable that must converge consistently. Perhaps there was concern over the runtime cost of a more complicated verification rule. Or perhaps there was some soundness hole they didn't think they could plug. I dunno. Whatever the cause I have to go to war with the army I have not the army I wish I had.

9 comments:

James Iry said...: The problem with seeing Plan 4 as a bug fix is that I wouldn't ALWAYS make this translation. It would only be done when try/catch is used as an argument to the constructor. So from a user point of view order of evaluation would change depending on something seemingly irrelevant.

For Plan 4 to be a bug fix I'd have to always use locals for constructor args and that would lead to bigger classfile sizes.; February 15, 2013 at 2:41 PM
James Iry said...: That's a viable alternative but I don't think I want to go down that route. As it is the compiler already has 2 points where it deals with problematic try/catch. What you're suggesting would add a third point. If I can come up with a clean, efficient solution to this class initialization issue then I could consolidate all the problematic try/catch transformation logic to one place.; February 17, 2013 at 8:30 PM
Patrick Doyle said...: You just need any old reference to the class in order to run its initializers. Either a getstatic, or a call to an empty static method, should get optimized away by the jit.; February 18, 2013 at 9:41 AM
James Iry said...: Of course, but what if the "Foo" has no static fields? There aren't any that guaranteed to be there by the Java spec.

Even in cases where there is one, the JVMs separate compilation model means you should assume that the layout of a 3rd party class can change between compilation and run time so you should bind to as little as possible. If the compiler finds a static field in a third party class and gens bytecode that references it without that being in user code then the user will be VERY confused when the presence or absence of an apparently unreferenced static field breaks binary compatibility as the version of the 3rd party class is changed without recompiling.; February 18, 2013 at 9:50 AM
Patrick Doyle said...: Yeah, I realized after I submitted that you have to cope with arbitrary classes. Anyway, you can always do something like "aconst_null; checkcast ThatClass" I think. Again, this would get optimized away by the jit.; February 19, 2013 at 12:31 PM
James Iry said...: checkcast doesn't cause static init to fire. http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.5, which I verified by plugging in "Foo2 x = (Foo2)null;" into my sample java code above, inspecting the bytecode to make sure the checkcast was emitted, then running.; February 19, 2013 at 12:50 PM
Patrick Doyle said...: Well, how about that. You learn something every day.

Maybe "ThatClass.class.getModifiers()"?; February 19, 2013 at 1:52 PM
James Iry said...: Class#getModifiers also does not cause class init. Even if it did, it would share the same potentially fatal flaw as Class.forName: reflective features can be and often are disabled by security managers.; February 19, 2013 at 2:04 PM
Patrick Doyle said...: Sing and a miss...

Seems like all we're left with is "new ThatClass; pop". The jit ought to optimize that away too. It would suck somewhat in the interpreter, though, since it would actually allocate an object.; February 19, 2013 at 2:31 PM

One Div Zero

Friday, February 15, 2013

The Best-Laid Schemes O' Mice An' Constructors Gang Aft Agley