Tuesday, July 15, 2008

Is Scala for Academics and Egomaniacs?

This one isn't going to be a meaty technical article nor a deep philosophical exploration. Instead it's about perception vs reality. What moves me to post is the following 3 twits from Tim Dysinger.

Edit: A few more

Edit 2: One more, a blog post this time
I happened to be involved in the discussion that led to his conclusions. Here's the complete, unedited log from the #scala IRC channel. I leave it to posterity to decide if there was too much academia or egomania or if the guy was kicked for arguing.

Warning:Foul Language

(09:34:37 AM) doub: how can I print the address of an object rather than the output of its toString ?
(09:35:27 AM) dysinger: ah not c++
(09:35:32 AM) dysinger: it's java
(09:36:02 AM) dysinger: the default toString implies there is an address available
(09:36:04 AM) paulp: what is this "address" you speak of
(09:36:21 AM) paulp: you're harshing my abstraction mellow
(09:36:32 AM) doub: anything that could let me distinguish two distinct objects that have the same contenty
(09:36:34 AM) doub: *content
(09:36:43 AM) JamesIry: .equals
(09:36:45 AM) paulp: hashCode?
(09:36:49 AM) JamesIry: oops, .eq I mean
(09:36:49 AM) dysinger: Not even via JNI. JNI doesn't hand out objects' addresses, only "handles" that
(09:36:49 AM) dysinger: can be used to refer to them.
(09:37:05 AM) dysinger: hashCode is what you want.
(09:37:30 AM) doub: i have one object now, and one object later, and i want to know if that's the same or not, and i'd rather not store a reference to the first until i get the second
(09:37:37 AM) JamesIry: It's not what he wants. It's overridable and not guaranteed unique
(09:37:51 AM) dysinger: you should over-ride equals and hashCode if you want to compare content
(09:37:55 AM) DRMacIver: System.identityHashCode will give you the system version though
(09:37:58 AM) ijuma: hashCode is not necessarily good enough because it's an int and you may be running a 64-bit machine
(09:38:23 AM) ijuma: I mean System.identityHashCode above
(09:38:25 AM) dysinger: ^ the default impl you mean
(09:38:54 AM) DRMacIver: Sure. It's not foolproof.
(09:39:13 AM) JamesIry: System.identityHashCode isn't guaranteed unique either.
(09:39:26 AM) JamesIry: Only guaranteed to always be the same for the same object
(09:39:44 AM) doub: the System in java.lang ?
(09:39:47 AM) dysinger: doub if you over-ride hashCode and equals like it says in the core javadocs, you'll have all the control you need to compare two objects of the same class.
(09:39:48 AM) DRMacIver: Why don't you want to store a reference to the first?
(09:40:36 AM) JamesIry: doub, yes, System in Java.lang
(09:42:49 AM) doub: DRMacIver: i can't clearly state my reasons, so i guess they're bad ones
(09:43:05 AM) JamesIry: doub, the only thing guaranteed to work is .eq
(09:43:47 AM) dysinger: doub if you don't want to keep a reference to a previously seen object, take a hash of the object and keep that.
(09:44:21 AM) doub: with identityHashCode ?
(09:44:38 AM) dysinger: I would just use a library because I am lazy like that
(09:44:39 AM) dysinger: http://commons.apache.org/lang/api/org/apache/commons/lang/builder/HashCodeBuilder.html
(09:44:41 AM) lambdabot: Title: HashCodeBuilder (Lang 2.3 API), http://tinyurl.com/5639k7
09:45
(09:45:08 AM) dysinger: http://commons.apache.org/lang/api/org/apache/commons/lang/builder/EqualsBuilder.html
(09:45:10 AM) lambdabot: Title: EqualsBuilder (Lang 2.3 API), http://tinyurl.com/5n3wyz
(09:45:18 AM) doub: isn't there something low level in the jvm itself that can give me a unique string for an object ?
(09:45:36 AM) dysinger: hashCode is ok
(09:45:43 AM) JamesIry: Everybody please stop suggesting hash codes, identityHashCode or not. There is no guarantee that objects with unique identity will have unique hash codes.
(09:46:15 AM) dysinger: ah lol
(09:46:25 AM) dysinger: there is if I have the keyboard.
(09:46:50 AM) JamesIry: Read the spec for identityHashCode
(09:47:18 AM) JamesIry: It says not one word about uniqueness guarantees. A perfectly legal JVM could return 0 everytime the method is called.
(09:47:48 AM) dysinger: jameslry read the docs on equals and hashCode -> one of the fundementals of java.
(09:48:08 AM) dysinger: you don't just "use" them - you over-ride them.
(09:48:13 AM) DRMacIver: It's appropriate that "one of the fundamentals of java" is fundamentally broken. :)
(09:48:15 AM) JamesIry: equals -> same hash code, not the other way around
(09:48:47 AM) JamesIry: You can write your own hashCode to always return 0 and it will obey that contract
(09:48:55 AM) JamesIry: It would be stupid, but it would be "legal"
(09:49:17 AM) dysinger: lol what are you smoking ?
(09:49:42 AM) dysinger: omg - same shit different day - it's a formula - #1 hang out, #2 bash java/ruby/perl/etc, #3 don't actually know any java.
(09:49:51 AM) DRMacIver: It's a really powerful drug called "Knowing what you're talking about". :)
(09:50:01 AM) ijuma: dysinger: you are not paying attention to the discussion
(09:50:02 AM) JamesIry: Here's a proof. There are 2^32 unique ints. How many unique strings are there? Infinitely many. Ergo, there must be a possibility of collision.
(09:51:24 AM) ijuma: JamesIry: btw, there's a blurb in Object.hashCode about returning distinct integers for distinct objects where reasonably practical. of course, there's no guarantee, but in some cases it may be good enough.
(09:51:35 AM) ijuma: (the native Object.hashCode implementation that is)
(09:51:46 AM) JamesIry: ijuma: I agree, if you control the JVM it may very well be enough to use identityHashCode
(09:52:00 AM) JamesIry: But just using .eq is far more robust and guaranteed to work in the future.
(09:52:00 AM) dysinger: There is theoretically a possibility of a collision in the RFC for UUID too. It doesn't paralyze anybody except academics that don't actually code but stand around and debate edge cases.
(09:52:16 AM) JamesIry: dysinger, please read the original requirement
(09:52:24 AM) dysinger: please pontificate
(09:52:25 AM) ijuma: JamesIry: yeap, agreed
(09:52:40 AM) dysinger: I did read the original
(09:52:51 AM) JamesIry: doub wanted to know when he got the same (as in identity) object twice
(09:53:00 AM) dysinger: yep
(09:53:12 AM) dysinger: which can be handled perfectly by over-riding equals and hashCode
(09:53:18 AM) JamesIry: No!
(09:53:21 AM) ijuma: lol
(09:53:28 AM) JamesIry: Same identity, not content
(09:53:42 AM) ijuma: communication is hard sometimes
(09:53:54 AM) DRMacIver: It's not like this is a weird theoretical possibility which never crops up.
(09:53:54 AM) dysinger: over-riding those gives you any amount of comparison you want.
(09:54:01 AM) DRMacIver: scala> "\0".hashCode == "".hashCode
(09:54:01 AM) DRMacIver: res1: Boolean = true
(09:54:31 AM) dysinger: he didn't say I have random objects of different classes coming through
(09:54:52 AM) DRMacIver: Strings are not a particularly random choice. :)
(09:54:57 AM) ppohja: DRMacIver: you evil academic, standing around and finding just the corner cases :)
(09:55:05 AM) JamesIry: dysinger, why would you want to override equals to do a job when there's already a perfectly good .eq (or Java ==) method/operator that does the job?
(09:55:15 AM) JamesIry: Seriously, why?
(09:55:49 AM) JamesIry: Override equals when you want a non-identity based comparison. Override hashCode to be consistent. But when you want identity, there are tools built in to Java
(09:55:50 AM) DRMacIver: ppohja: It's amazing how often I genuinely get accused of being an academic in programming related discussions. I find it really funny. :)
(09:55:52 AM) JamesIry: or Scala
(09:55:54 AM) dysinger: because the default equals does not compare conntent.
(09:55:55 AM) dysinger: content
(09:55:59 AM) dysinger: it compares hashCodes
(09:56:12 AM) JamesIry: dysinger, no it compares identity, which is what doub was talking about
(09:56:22 AM) doub: given my system it would take some time to implement a storing mechanism for reference to past objects and comparisons with the new ones
(09:57:03 AM) doub: just printing the hash gave me a very loose but still usefull information about my object, and took less time than the conversation about my question :-)
(09:57:08 AM) dysinger: jameslry the default java equals compares the class and the hash code giving you identity
(09:57:08 AM) JamesIry: doub, stick 'em in an IdentityHashMap
(09:57:24 AM) doub: but I agree with all you said JamesIry, and I would use .eq for a production system
(09:57:36 AM) JamesIry: dysinger, no the contract just says identity. hashCode is not part of the contract.
(09:58:08 AM) DRMacIver: That would be a pretty stupid implementation when you can just compare the addresses of the objects...
(09:58:10 AM) JamesIry: The fact that a particular jvm uses an object's unique handle as a hashCode is an implementation detail
(09:58:37 AM) JamesIry: DRMacIver: except that internally a JVM can use 64 bit addresses
(09:58:51 AM) DRMacIver: Hm. Why is that a problem?
(09:59:09 AM) JamesIry: DRMacIver: sorry, I misunderstood you
(09:59:42 AM) JamesIry: DRMacIver: you're right, a JVM should compare addresses (or handles or something). It MAY base identityHashCode on that address or handle or whatever, and most probably do.
10:00
(10:00:00 AM) DRMacIver: I think hotspot uses the address in the lock table for hashCode
(10:00:19 AM) DRMacIver: Wouldn't swear to it though
(10:00:36 AM) dysinger: jameslry - if you have been on java very long you would know that the defaul equals uses class and hash code.
(10:00:39 AM) DRMacIver: (It definitely doesn't use the object's address, as that will change on a fairly regular basis)
(10:00:55 AM) dysinger: and that anytime you over-ride one of those methods - you over-ride both.
(10:00:59 AM) doub: would that be too restricting to require the jvm to expose a unique id per object (independent from the hash mechanism) ? like the address of some underlying memory structure
(10:01:11 AM) doub: is the JVM GC allowed to move objects in memory ?
(10:01:16 AM) dysinger: and that unless the universe explodes tomorrow - you will probably do just fine.
(10:01:33 AM) dysinger: doub - the vm moves shit all the time as needed.
(10:01:42 AM) DRMacIver: doub: Very much so.
(10:03:49 AM) DRMacIver: To the second part. The first part, probably not. But you'd have to account for the fact that collisions can still occur due to address reuse between GC cycles.
(10:04:19 AM) JamesIry: dysinger please point me to the line in the JLS that says that "==" uses class and hashCode. There's no need for us to argue. Just point it out to me.
(10:04:33 AM) dysinger: I said equals
(10:04:36 AM) dysinger: I never typed ==
(10:04:41 AM) JamesIry: The default equals is based on ==
(10:04:44 AM) dysinger: lol
(10:04:54 AM) DRMacIver: So the only guarantee you could get would be that if x and y are both live then address(x) == address(y) implies x == y. If you saw x and y at a different point in time you wouldn't be able to guarantee that same address meant x == y
(10:05:05 AM) dysinger: this is what I am talking about
(10:05:14 AM) dysinger: academic arguing
(10:05:24 AM) ijuma: interesting, it seems like once you call identityHashCode, it's computed once and then stored in the mark word
(10:05:37 AM) JamesIry: From the doc: The equals method for class Object implements the most discriminating possible equivalence relation on objects; that is, for any non-null reference values x and y, this method returns true if and only if x and y refer to the same object (x == y has the value
(10:05:42 AM) ppohja: dysinger, no, it's that you're offering the wrong solution to the original problem.
(10:05:43 AM) dysinger: java has been around a dozen years with no one running away screaming from having been burned from hashCode and equals.
(10:05:45 AM) ijuma: for HotSpot, that is
(10:06:07 AM) ijuma: derived from the following statement, "Finally, there's not currently space in the mark word to support both an identity hashCode() value as well as the thread ID needed for the biased locking encoding. Given that, you can avoid biased locking on a per-object basis by calling System.identityHashCode(o)."
(10:06:31 AM) dysinger: ppohja - what's the "right" sollution in scala ? seems like we have been discussing this for 30 minutes.
(10:06:45 AM) dysinger: I know java like the back my hand.
(10:06:48 AM) JamesIry: ppohja: .eq is semantically the same as Java's "=="
(10:06:55 AM) JamesIry: I mean dysinger
(10:07:04 AM) dysinger: in scala - ok
(10:07:16 AM) JamesIry: Scala's "a eq b" translates to Java's "a == b"
(10:08:44 AM) JamesIry: Also from the doc: As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
(10:09:10 AM) JamesIry: Note the caveat: this implementation technique is not required by the JavaTM programming language.
(10:09:30 AM) DRMacIver: So, just for the record, I looked up the source code for Object.equals
(10:09:49 AM) DRMacIver: public boolean equals(Object obj) {
(10:09:49 AM) DRMacIver: return (this == obj);
(10:10:11 AM) JamesIry: dysinger, have you looked at the back of your hand recently?
(10:10:12 AM) DRMacIver: dysinger: Or, in other words, you're full of shit. Kindly shut up about this now.
(10:11:49 AM) ijuma: JamesIry: Looks like he pays more attention to the front of his hand ;)
(10:11:54 AM) dysinger: fuck you guys
(10:12:12 AM) doub: do I get bad karma points for being the initiator of a conflict ?
(10:12:13 AM) dysinger: read the javadoc
(10:12:20 AM) JamesIry: I quoted the javadoc
(10:12:28 AM) dysinger: so did I
(10:12:33 AM) DRMacIver: dysinger: Sorry, but no thanks. I find idiocy a huge turnoff.
(10:12:45 AM) dysinger: nobody is trying to turn you on
(10:12:54 AM) ijuma: doub: no :)
(10:13:11 AM) dysinger: fuck you drmaciver - you and I have debated endlessly on other topics.
(10:13:24 AM) JamesIry: dysinger says: the default object.equals is based on class and hashCode. DRMacIver shows that it's based on ==. Dysinger gets mad
(10:13:34 AM) DRMacIver: Really? I dont' remember any instances.
(10:13:43 AM) dysinger: show me the code
(10:13:46 AM) DRMacIver: You probably blended into the endless sea of unwashed idiocy on the internet
(10:13:49 AM) JamesIry: DRMacIver: showed you the code
(10:14:04 AM) ppohja: DRMacIver: To me, this seems like you're talking to a eliza.
(10:14:12 AM) DRMacIver: ppohja: It does rather, doesn't it?
(10:14:17 AM) JamesIry: A troll
(10:14:26 AM) dysinger: "If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. "
(10:14:27 AM) DRMacIver: I'm about ready to break out the banstick to be honest
(10:14:37 AM) dysinger: eat a dick egomanics
(10:14:38 AM) JamesIry: dysinger, I already says equals implies same hashcode.
(10:14:46 AM) JamesIry: Look back through there. I said that.
(10:14:47 AM) mode (+o DRMacIver ) by ChanServ
(10:14:54 AM) dysinger: there is no implies
(10:14:55 AM) ppohja: dysinger, after all these years with java, you're forgotten the difference between -> and <-> ?
(10:14:58 AM) mode (+b *!*n=tim@*.hsd1.or.comcast.net ) by DRMacIver
(10:14:58 AM) dysinger left the room (Kicked by DRMacIver (I've already told you you're not my type.)).