I just don’t get the rules concerning
System.Object’s
GetHashCode and
Equals methods. Here they are, taken from the current
.NET Framework Class Library Reference documentation (I’m actually using 2.0,
but this stuff doesn’t change):
Notes to Implementers: A hash function is used to quickly generate a number
(hash code) that corresponds to the value of an object. Hash functions are usually
specific to each Type
and must use at least one of the instance fields as input. A hash function
must have the following properties:
-
If two objects of the same type represent the same value, the hash function must
return the same constant value for either object.
-
For the best performance, a hash function must generate a random distribution for
all input.
-
The hash function must return exactly the same value regardless of any changes that
are made to the object.
My problem with the above is this: The emphasised text in the above statement,
together with Rule 1 and Rule 3 mean that all objects must be immutable, like
href="http://msdn.microsoft.com/library/en-us/cpref/html/frlrfsystemstringclasstopic.asp">
System.String (which is - rather conveniently - the example given in the documentation).
If my GetHashCode method must satisfy the first rule whilst using at least one of
the instance fields, then the return value must change if the value of the instance
field(s) changes. Rule 3 says it can’t.
Here’s one of the example implementations from the documentation:
public struct Int32 {
public int value;
//other methods...
public override int GetHashCode() {
return value;
}
}
This obviously passes the “must use a field” rule, but fails Rule 3. The other examples
have the same problem.
I plucked a few random classes out of the BCL and looked at their Equals and GetHashCode
implementations using Reflector.
They all implemented Equals just the way you’d expect, but I thought the GetHashCode
implementations were interesting:
System.Net.Cookie: Generates a new value from the current state of the fields, violating
Rule 3.
System.Drawing.Drawing2D.Matrix: Cheat, cheat, cheat!! You know the compiler warning
you get when you override Equals but not GetHashCode? This class overrides GetHashCode,
but all it does is call base.GetHashCode(), cunningly avoiding the compiler warning
but violating Rule 1!
System.Uri: Returns a the value of a field called m_Hash,
which is calculated in the constructor and then isn’t changed.
The first two are clearly in violation of the rules, but I thought System.Uri’s
implementation was interesting. So it uses the fields to calculate the hash, but
then you can freely alter the fields via the object’s properties and the hash doesn’t
change. OK. But what if you try to look up a Uri in a hashtable? Wouldn’t this lead
to a failure to find a Uri that had been inserted in the hashtable and then had
its fields altered? I don’t know, and my brain is beginning to hurt.
Does anybody have a set of unit tests (NUnit preferred) to exercise the Equals and
GetHashCode methods? Has anyone created a class that would pass these tests? I understand
what GetHashCode is used for, but that doesn’t help solve the problem. How are we
supposed to know if our class is to be used as a key in a hashtable?
Update: System.Uri is immutable (thanks, Brad), so that means the hashtable problem disappears. It doesn’t help me implement my mutable classes though!
Update: I just found a similar rant by Ian Griffiths. He concludes:
The right thing to do would be to document this hash constancy requirement as a feature of the
Hashtable rather than trying to push it into the GetHashCode requirements. In fact
that’s precisely what the documentation for Hashtable does! It says “Key objects must be
immutable as long as they are used as keys in the Hashtable.” So it looks like the persistent impossible advice in GetHashCode is simply historical accident.
I agree - let’s see the documentation fixed!