Re: Name hashing function causing a perf regression

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 12 Sep 2014 12:21:55 -0700

On Fri, Sep 12, 2014 at 12:11 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> Josef Bacik <jbacik@xxxxxx> writes:
>>
>> So the question is what do we do here?  I tested other random strings
>> and every one of them ended up worse as far as collisions go with the
>> new function vs the old one.  I assume we want to keep the word at a
>> time functionality, so should we switch to a different hashing scheme,
>> like murmur3/fnv/xxhash/crc32c/whatever?  Or should we just go back to
>
> Would be interesting to try murmur3.

I seriously doubt it's the word-at-a-time part, since Josef reports
that it's "suboptimal for < sizeof(unsigned long) string names", and
for those, there is no data loss at all.

The main difference is that the new hash doesn't try to finish the
hash particularly well. Nobody complained up until now.

The old hash kept mixing up the bits for each byte it encounters,
while the new hash really only does that mixing at the end. And its
mixing is particularly stupid and weak: see fold_hash() (and then
d_hash() does something very similar).

So the _first_ thing to test would be to try making "fold_hash()"
smarter. Perhaps using "hash_long(hash, 32)" instead?

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html