Re: Name hashing function causing a perf regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/12/2014 03:21 PM, Linus Torvalds wrote:
On Fri, Sep 12, 2014 at 12:11 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
Josef Bacik <jbacik@xxxxxx> writes:

So the question is what do we do here?  I tested other random strings
and every one of them ended up worse as far as collisions go with the
new function vs the old one.  I assume we want to keep the word at a
time functionality, so should we switch to a different hashing scheme,
like murmur3/fnv/xxhash/crc32c/whatever?  Or should we just go back to

Would be interesting to try murmur3.

I seriously doubt it's the word-at-a-time part, since Josef reports
that it's "suboptimal for < sizeof(unsigned long) string names", and
for those, there is no data loss at all.

The main difference is that the new hash doesn't try to finish the
hash particularly well. Nobody complained up until now.

The old hash kept mixing up the bits for each byte it encounters,
while the new hash really only does that mixing at the end. And its
mixing is particularly stupid and weak: see fold_hash() (and then
d_hash() does something very similar).

So the _first_ thing to test would be to try making "fold_hash()"
smarter. Perhaps using "hash_long(hash, 32)" instead?

                 Linus


Ok with the hash_long(hash, 32) change I get this

[jbacik@devbig005 ~/local] ./hash
Old hash table had 1000000 entries, 0 dupes, 0 max dupes
New hash table had 331504 entries, 668496 dupes, 5 max dupes
We had 292735 buckets with a p50 of 3 dupes, p90 of 4 dupes, p99 of 5 dupes for the new hash

So that looks much better, not perfect but hlist_for_each through 5 entries isn't going to kill us, I'll build a kernel with this and get back shortly with real numbers. Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux