Wednesday, February 15, 2012

Checked Exceptions Might Have Their Place, But It Isn't In Java

In Java's more than 15 years no language has repeated Java's experiment with checked exceptions, other than some languages designed as Java extensions (e.g. MultiJava and GJ). Certainly no mainstream or even nearly mainstream languages have. Notable languages that don't bother with checked exceptions include C#, which started as very nearly a clone of Java, and Scala, which borrows many Java concepts and then beefs up their static type checking.

Even within the Java community checked exceptions have been at least somewhat deprecated. Spring and Hibernate, for instance, moved strongly away from checked exceptions. Bruce Eckel, author of Thinking in Java, considers them a mistake and Joshua Block, author of Effective Java, cautions against overuse.

Now, I've seen lots of arguments that you should "use checked exceptions when the caller must somehow recover." But that argument assumes that Java is basically a procedural language. It's not. It's an object oriented language where reusable abstractions are common. Just a couple of examples from the standard Java library reveals exactly what's wrong with Java's checked exceptions.

The java.util.Iterator methods, for instance, aren't declared as throwing any checked exceptions. If you create your own Iterator that calls a throwing method then the implementation must do a try {...} catch (CheckedException e) {throw new UncheckedException(e)}. Or worse, you swallow the exception. Either way it's boilerplate.

On the flip side, to avoid that problem the call method on the java.lang.Callable interface declares that it might throw "java.lang.Exception". But Exception tells you nothing about what might fail making it exactly equivalent to the unchecked RuntimeException except that you must "handle" it by either redeclaring it in the throws clause (which just pushes the problem upstream) or doing the try/catch/rethrow-unchecked dance. Again more boilerplate. Or, again, more opportunity to swallow.

You might think generics give a way out of the quagmire, but Java has a fatal flaw here. The throws clause is the only point in the entire Java language that allows union types. You can tack "throws A,B,C" onto a method signature meaning it might throw A or B or C, but outside of the throws clause you cannot say "type A or B or C" in Java. So if you have "interface MyInterface<T extends Exception> {void mightThrow() throws T...}" then T must be bound to a single exception type for any given instantiation of MyInterface. And, as special bonus, with that structure you can't say some particular implementation doesn't throw at all. Which means that in practice it's little better than the java.lang.Callable "throws Exception" solution.

The team working on lambdas for Java has found checked exceptions a major stumbling block essentially for the reasons outlined here. A significant amount of work is going into easing that pain

In short, as it stands the design of the Java language requires you to either avoid reusable abstractions or wrap useless checked exceptions in boilerplate. And if I want to avoid reusable abstractions then I know where to find Pascal. Could checked exceptions be made workable? Perhaps with some careful language design. But Java isn't that design.

Too Dense?

With this post I want to ask a question: what does it mean for code to be "too dense?" This question has implications on everything from languages to APIs to coding style.

I've seen debaters defending Java's verbosity precisely because it isn't "too dense." They say the sparsity of the code makes it easy to understand what's going on. Similarly it's common to bash programmers for playing "golf" when their code is dense. But if we're allergic to density then why do programmers seem to prefer to use tools that create code density when there are fairly straightforward ways to create less dense code?

For an example I'm going to use regular expressions(1) since just about every programmer knows what they are, they're very dense, they exist in direct or library form for every general purpose programming language, and they are easy to replace with "normal" code.

Regexes are tight little strings that have very little in the way of redundancy. They're frequently accused of being "write only" - impossible to read and maintain once written. They are the poster children for "too dense" if anything is.

With the modern-ish focus on refactoring and the understanding that code is read far more often than it is written then if regexes are too dense you'd think programmers would be eager to replace those dense strings with more standard code just to improve readability. After all, a regex encodes a simple state machine or perhaps something a bit stronger if the common Perl-ish extensions are used, so replacing them is easy.

Yet it doesn't happen, at least not much. Regexes remain a mainstay. New regexes are continually written and old ones aren't ripped out and rewritten as loops and if statements just to gain some more readability. They're expanded for performance reasons or when the logic needed exceeds the power of regexes, but they almost never get replaced with an explicit state machine just to improve maintainability.

Why is that? We can't blame a few bad programmers. Regexes are far too widely used for that simple cop out.

What regexes and our use of them suggests is that we're not allergic to density in information per character but to something else. One culprit is is simply unfamiliarity. Regexes are okay because we're familiar with them, other forms of density are bad because we're not familiar with them.

But maybe it's even stronger than that. Perhaps the familiarity with regexes makes us aware of a different kind of density/sparsity trade off. A regex's information density may make it slower to read in terms of characters per minute but we know that expanded code would be slower to read in terms of concepts per minute.

In this post, I picked on regular expressions because they're so widely known and used but the bigger question is in the design of languages, APIs, and coding conventions. This article started with a question and will end with more. Are regular expressions outliers, unusual in creating value out of density? Is there some optimum relationship between frequency of use and density where something becomes too dense if we don't use it often enough? If we create dense languages, APIs, or coding conventions are we creating impenetrable barriers to entry for newbies? If we don't create dense notations are we providing a disservice to those who will use the notation often? Is there any hope that a designer of a language, API, or coding convention can find a near optimum density for his or her target audience that remains near optimal for a long time over patterns of changing usage?

Footnotes

  1. Insert "now you have two problems" joke here.