Tuesday, August 7, 2007

Java equals() and hashcode

The hashCode() method exists purely for efficiency. The Java platform architects anticipated the importance of hash-based collection classes -- such as Hashtable, HashMap, and HashSet -- in typical Java applications, and comparing against many objects with equals() can be computationally expensive. Having every Java object support hashCode() allows for efficient storage and retrieval using hash-based collections.


Room for improvement?

Building hashing into the root object class of the Java class library was a very sensible design compromise -- it makes using hash-based containers so much easier and more efficient. However, several criticisms have been made of the approach to and implementation of hashing and equality in the Java class library. The hash-based containers in java.util are very convenient and easy to use, but may not be suitable for applications that require very high performance. While most of these will never be changed, it is worthwhile to keep in mind when you're designing applications that rely heavily on the efficiency of hash-based containers. These criticisms include:
Too small a hash range. Using int, instead of long, for the return type of hashCode() increases the possibility of hash collisions.

Bad distribution of hash values. The hash values for short strings and small integers are themselves small integers, and are close to the hash values of other "nearby" objects. A more well-behaved hash function would distribute the hash values more evenly across the hash range.

No defined hashing operations. While some classes, such as String and List, define a hash algorithm to be used in combining the hash values of its constituent elements into a single hash value, the language specification does not define any approved means of combining the hash values of multiple objects into a new hash value. The trick used by List, String, or the example class A discussed earlier in Writing your own equals() and hashCode() methods are simple, but far from mathematically ideal. Nor does the class library offer convenience implementations of any hashing algorithm that would simplify the creation of more sophisticated hashCode() implementations.

Difficulty writing equals() when extending an instantiable class that already overrides equals(). The "obvious" ways to define equals() when extending an instantiable class that already overrides equals() all fail to meet the symmetry or transitivity requirements of the equals() method. This means that you must understand the structure and implementation details of classes you are extending when overriding equals(), and may even need to expose private fields in the base class as protected to do so, which violates principles of good object-oriented design.

No comments: