On Fri, Sep 12, 2014 at 12:11 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote: > Josef Bacik <jbacik@xxxxxx> writes: >> >> So the question is what do we do here? I tested other random strings >> and every one of them ended up worse as far as collisions go with the >> new function vs the old one. I assume we want to keep the word at a >> time functionality, so should we switch to a different hashing scheme, >> like murmur3/fnv/xxhash/crc32c/whatever? Or should we just go back to > > Would be interesting to try murmur3. I seriously doubt it's the word-at-a-time part, since Josef reports that it's "suboptimal for < sizeof(unsigned long) string names", and for those, there is no data loss at all. The main difference is that the new hash doesn't try to finish the hash particularly well. Nobody complained up until now. The old hash kept mixing up the bits for each byte it encounters, while the new hash really only does that mixing at the end. And its mixing is particularly stupid and weak: see fold_hash() (and then d_hash() does something very similar). So the _first_ thing to test would be to try making "fold_hash()" smarter. Perhaps using "hash_long(hash, 32)" instead? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html