Posted By: erik | Aug 30th, 2007 @ 7:00 PM
page 1 of 1
Comments: 17 | Views: 3302
Hi,

I need to hash a string value an array member.  The array size is 1009, so to map a string to an array member I do something like this:

int arrayMember = (ABS(myStr.GetHashCode()) % 1009

Everything seems to work great, but at some point later in the day,  I'll run the routine again but get a different hash value for the same string value. 

My question is whether there is something behind GetHashCode() that might cause the results to change for different runs of the program.  Is GetHashCode seeded in some way when the program initializes, perhaps? 

Thanks!

-Erik
Sven Groot
Sven Groot
My name has 9 letters. Coincidence? I think not...
According to reflector it's a fairly standard hash code algorithm. It shouldn't vary given the same input.
erik wrote:
to map a string to an array member I do something like this:

int arrayMember = (ABS(myStr.GetHashCode()) % 1009



Trying to get into the "Daily WTF" (or whatever it's called now) or what are you trying to do... Actually never mind, I don't want to hear it. Perplexed
Matthew van Eerde
Matthew van Eerde
AKA Maurits
1009 is prime, which is a tantalizing clue.
TommyCarlier
TommyCarlier
I want my scalps!
A hashing function ALWAYS has to return the same hash-code for the same value. I seriously doubt the built-in string class would return different hash-codes for the same string.
odujosh
odujosh
Need Microsoft SUX now!
Not enough information. From the information you did provide we have the question, "Why would you ever want to do that?"

If you are curious about how string.ToHash works download reflector.

From a quick look. From the look of your problem you should be in deep enough to work your way through. Looks like the answer would always be the same because they are just doing some math on the int equivalent.

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override unsafe int GetHashCode()
{
    fixed (char* text = ((char*) this))
    {
        char* chPtr = text;
        int num = 0x15051505;
        int num2 = num;
        int* numPtr = (int*) chPtr;
        for (int i = this.Length; i > 0; i -= 4)
        {
            num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
            if (i <= 2)
            {
                break;
            }
            num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
            numPtr += 2;
        }
        return (num + (num2 * 0x5d588b65));
    }
}
amotif
amotif
No Silver Bullet

You shouldn't rely on the implementation details of String.GetHashCode() remaining the same.

Also, GetHashCode isn't guaranteed to return the same value for a given object value from one process to the next, per the docs:


"The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again."

littleguru
littleguru
<3 Seattle
You should never rely on what GetHashCode() returns. Especially since the default implementation returns (at least under 32 bit systems and right now when I write this post) the reference value (I don't know if it is exactly the pointer in memory) of the object.
Minh
Minh
WOOH! WOOH!
erik wrote:
The number of buckets is prime because it's a simple way to get an even distribution of words across the buckets.

Sorry, I'm bad in math, but how does having (prime #) of buckets ensure an even distribution. I thought that's more of a function of the hash algorithm.

Say, if you have a poor hash function:

return _myString.GetHashCode() % 2;

then, having 1009 buckets wouldn't help. Would it?
littleguru
littleguru
<3 Seattle
Minh wrote:

erik wrote: The number of buckets is prime because it's a simple way to get an even distribution of words across the buckets.

Sorry, I'm bad in math, but how does having (prime #) of buckets ensure an even distribution. I thought that's more of a function of the hash algorithm.

Say, if you have a poor hash function:

return _myString.GetHashCode() % 2;

then, having 1009 buckets wouldn't help. Would it?


Some older hashing algorithms required the bucket number to be a prime... That allowed to hasing function to add the items in a distributed manner.
Minh
Minh
WOOH! WOOH!
littleguru wrote:

Some older hashing algorithms required the bucket number to be a prime... That allowed to hasing function to add the items in a distributed manner.
Aahh... You mean the output of the hashing function's maximum is a prime number, not Int32.Max like .Net?

But that doesn't prevent you from mapping that to a smaller set of buckets, though, right? If the distribution is even, shouldn't matter.
Matthew van Eerde
Matthew van Eerde
AKA Maurits
Yes, if the distribution is even.  Using a prime number of buckets helps protect you from unevennesses in the distribution.

This is why the reproductive cycle of cicadas is 17 years, for example, rather than 16; it minimizes the chance of resonance with a predator cycle.
Frank Hileman
Frank Hileman
VG.net
The "mod prime" technique has been considered obsolete for many years. You can read why in these links. So any bucket size is fine with a good hash function, and powers of two are typically used.

http://www.azillionmonkeys.com/qed/hash.html

http://burtleburtle.net/bob/hash/

odujosh
odujosh
Need Microsoft SUX now!
Frank Hileman wrote:
The "mod prime" technique has been considered obsolete for many years. You can read why in these links. So any bucket size is fine with a good hash function, and powers of two are typically used.

http://www.azillionmonkeys.com/qed/hash.html

http://burtleburtle.net/bob/hash/



I tend to distrust sites with silly names. Maybe I am alone here:)
Matthew van Eerde
Matthew van Eerde
AKA Maurits
Well, there are obvious benefits from doing a mod-prime.  There are also obvious benefits from doing a mod-(power-of-2).

I would caution against doing a mod-n, though, where n is simultaneously prime and a power of 2... it's true you get the benefits of both techniques, but there are some unexpected side effects due to your restricted range of choice in n. Wink
Minh
Minh
WOOH! WOOH!
Matthew van Eerde wrote:

I would caution against doing a mod-n, though, where n is simultaneously prime and a power of 2... it's true you get the benefits of both techniques, but there are some unexpected side effects due to your restricted range of choice in n. Wink

Why? It worked for Neo Smiley


page 1 of 1
Comments: 17 | Views: 3302
Microsoft Communities