Overriding Object’s hashcode() method

 


Plan

The hashCode() method : principles and simple example


Brief definition of the hashCode() method 

The public native int hashCode() method comes from the Object class.
According to its javadoc,it  returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by java.util.HashMap.

The hashCode() method intention is clear : being used in collections relying on hash tables : HashMap, HashSet, LinkedHashMap, etc…

Default implementation of the hashCode() method and when override it

According to the javadoc, the hashCode() method of the Object class is defined in this way : as much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.).
It means that  two objects have the same hashcode value if these refer to the same reference. 

For example, with the following Address class that doesn’t override hashcode() :

public class Address {
 
    private String city;
    private String country;
 
    public Address(String city, String country) {
	this.city = city;
	this.country = country;
    }
}

This code :

Address address = new Address("my city", "my country");
Address addressReferencingTheSameObject = address;
Address otherAddress = new Address("my city", "my country");
 
System.out.println("is address == addressReferencingTheSameObject ? " + (address == addressReferencingTheSameObject));
System.out.println("isaddress.hashCode() == addressReferencingTheSameObject.hashCode() ? " + (address.hashCode() == addressReferencingTheSameObject.hashCode()));
 
System.out.println("is address == otherAddress ? " + (address == otherAddress));
System.out.println("is address.hashCode(otherAddress) == otherAddress.hashCode() ? " + (address.hashCode() == otherAddress.hashCode()));

produces the following output :

is address == addressReferencingTheSameObject ? true
isaddress.hashCode() == addressReferencingTheSameObject.hashCode() ? true
is address == otherAddress ? false
is address.hashCode(otherAddress) == otherAddress.hashCode() ? false

address and addressReferencingTheSameObject refer to the same object (first output).
So, these have the same hashcode value (second output).
address and otherAddress refer to two distinct objects (third output).
So, these have not the same hashcode value (four output).

When we override equals() in a class, we have to also override hashCode().
It makes part of the contract that we will present in the next point

The general contract to respect when we override the hashCode() method

Here is the part of the javadoc that exposes the general contract:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the java.lang.Object.equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The first point is the consistency requirement.
Violating this principle means generally that the implementation relies not only on the state of the object to evaluate its hashcode. It very probably relies on external derived information which the value may change through the time even if information used in hashCode()  has not changed.
In addition to be more expensive in processing time, it’s also very tricky.

The second and the third points state the relationship between equals() and hashCode()

The rule to respect is simple : if two objects are equals in terms of equals() method, the values returned by  hashCode()  have to be the same.
The default implementation of these two methods in the Object class respects this rule as equals() will return true if the two tested objects refer to the same reference and hashCode() will return the same value for two objects that refer to the same reference.

The third point says that the reverse is no part of the contract :  two objects that have the same value returned by hashCode() may not be equals in terms of equals() method. Now, to get better performance with collections based on hash code, it is strongly advised that the hashCode() method produces as much as possible distinct values for unequal objects.

A simple example of hashCode() overriding that respects the contract

We will reuse the Address class that contains two fields : city and country.
It defines an  equals() method where two Address  objects are considered as equal if these have the same city and the same country. To get more information about how override equals(), you may refer to this post.

Here is the class :

import java.util.Objects;
 
public class Address {
 
    private String city;
    private String country;
 
    public Address(String city, String country) {
	this.city = city;
	this.country = country;
    }
 
    public String getCity() {
	return city;
    }
 
    public String getCountry() {
	return country;
    }
 
    @Override
    public boolean equals(Object obj) {
	// performant comparison
	if (this == obj)
	    return true;
 
	// null check + cast respecting the substituable classes principle (lipskov principle)
	if (!(obj instanceof Address))
	    return false;
 
	// fail cases
	Address other = (Address) obj;
	if (!Objects.equals(city, other.city))
	    return false;
	if (!Objects.equals(country, other.country))
	    return false;
 
	// remaining case : successful
	return true;
    }
 
}

As said early, if a class overrides equals(), it must also override hashCode().
So we will do that.
Here is the unit test that asserts we respect the rules of the hashCode() contract.
We will only test this rule of the contract : if two objects are equals for the equals() method, the values returned by  hashCode()  have to be the same.

Checking the consistency rule is very complicated and not always feasible.
To unitary test it, we should know which condition produces distinct results of the hashCode() method invocation for a same object. Besides, if we knew this condition, it would be removed from the hashCode() implementation. 
So, we don’t check this rule. This rule may may be interesting to unitary test if we suspect a potential concern about its respect in the actual hashCode() implementation or if we noticed an issue at runtime about it.

The last point of the contract  (two objects that have the same value returned by hashCode() may not be equals in terms of equals() method) doesn’t make really sense to  be tested either as finally it means that two objects with same value returned by hashCode() may be or not be equal. As equals() returns either true or false, in both cases, it respects this point.
A more interesting test would be checking the quality of distribution of the hashCode() method. We could for example assert that for a some number of Address objects created, we have not more than a some percent of collision (Address objects that have the same hash code value for Address objects that are not equals according to equals())

import org.junit.Test;
 
import davidhxxx.basics.java.model.equals.Address;
import junit.framework.Assert;
 
public class AddressHashCodeTest {
 
    @Test
    public void hashcode_have_same_value_when_two_objects_are_equals() throws Exception {
	Address addressOne = new Address("my city", "my country");
	Address addressOther = new Address("my city", "my country");
	Assert.assertTrue(addressOne.equals(addressOther));
	Assert.assertEquals(addressOne.hashCode(), addressOther.hashCode());
    }
}

Here is the hashCode() implementation :

    @Override
    public int hashCode() {
	final int prime = 31;
	int result = 1;
	result = prime * result + ((city == null) ? 0 : city.hashCode());
	result = prime * result + ((country == null) ? 0 : country.hashCode());
	return result;
    }

To get an efficient hashCode() method, we have to reduce the collisions of hash code for objects that are not equals as more the collisions are frequent,  more retrieving a element from a hash code based collection is slow.
The idea is to compute a hash value for each field used in the hashcode() method (generally the same fields than which of them used in the equals() method) and to sum the individual hash values in a way where the final hash value produced has very few potential collisions for not equals objects.

Some interesting things to note :

1) A nonzero initial value is used.

int result = 1;

In Java Effective (Joshua Bloch), we can read that with this way, the hash value will be affected by initial fields whose hash value is zero. If zero were used as the initial value, the overall hash value would be unaffected by any such initial fields, which could increase collisions.

2)  For each field we use to compute the hash code, we multiply the previous overall hash value by the prime number (31) and we add to this computation the current hash value of the current field.

result = prime * result + ((city == null) ? 0 : city.hashCode());
result = prime * result + ((country == null) ? 0 : country.hashCode());

This way of doing reduces the collisions.
Suppose we only summed the value of each field in this way :

result = result + ((city == null) ? 0 : city.hashCode());
result = result + ((country == null) ? 0 : country.hashCode());


The hash code would have the same value for two Address objects  (o1 and o2) where o1.city.equals(o2.country) and  o1.country.equals(o2.city) because this way of computing doesn’t weight the hash code computing for a unitary field according to the order of it in the overall computation. 

3) For a field with a null value, we associated to it a 0 hash value.
This way of doing is classic. Many JDK classes relies on it.
For example, the public static int Objects.hashCode(Object o) uses this pattern :

    public static int hashCode(Object o) {
        return o != null ? o.hashCode() : 0;
    }

This way suits for the most of cases.
However to compute hash value for numeric wrapper fields such as Short or Integer, using the 0 value could create many collisions if we have fields that  may often have 0 and null values. In this case, to have a more efficient hashCode() method, we should use another constant value than 0 for the null numeric object field. 

Ce contenu a été publié dans Non classé. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *