Hashtable's and the tale of runtime

aph at redhat.com (Andrew Haley) · Sat Feb 11 14:15:45 2006

Christian Thalinger writes:
 > Hi!
 > 
 > We had a curious problem with SPECjvm98 _213_javac and cacao.  A newly
 > recompiled cacao ended with 10% preformance degradation.  After further
 > investigation we ended with VMSystem.identityHashCode() and it's return
 > value.
 > 
 > _213_javac make heavy use of Hashtable's and the objects stored in the
 > hashtable use VMSystem.identityHashCode() as their hash function.  Since
 > alignment is normally 8 or 16-bytes, the distribution of the objects
 > seems to be not optimal.  We ended with 40000 Object.equals() calls more
 > and a doubled runtime for Hashtable.get() (measured clock cycles), as
 > Object.equals() is used in Hashtable.get().

This seems a bit strange.  Hashtable.rehash() seems always to generate
a new capacity based on the old capacity * 2 + 1, so you wouldn't
expect to see a great many collisions as the definition of hash() is

   key.hashCode() % buckets.length

Okay, buckets.length isn't necessarily prime, but you wouldn't expect
this to have a very non-uniform distribution.

 > A shifting of the VMSystem.identityHashCode() pointer value helps
 > (sometimes), but shifting by 3 makes it even worse (what was my guess to
 > make it better).
 > 
 > Any thoughts?  Should we change Hashtable.hash()?  Or return a better
 > value for VMSystem.identityHashCode()?  But most runtimes only return
 > the memory pointer...

There are things that we can try, such a using a better way to turn an
address into a hash.  But it would be nice first to know the real
cause of the very non-uniform distribution, if indeed that's what is
happening.

Andrew.