Patrick Reynolds <piki@xxxxxxxxxx> writes: > The original xdl_hash_record is essentially DJB hash, which does a > multiplication, load, and xor for each byte of the input. Commit > 6942efc introduces an "XDL_FAST_HASH" version of the same function > that is clearly inspired by the DJB hash, but it does only one > multiplication, load, and xor for each 8 bytes of input -- i.e., fewer > loads, but also a lot less bit mixing. Less mixing means far more > collisions, leading to the performance problems with evil-icons. It's > not clear to me if the XDL_FAST_HASH version intended to match the > output of the DJB hash function, but it doesn't at all. Note that XDL_FAST_HASH is just a ripoff of the hashing scheme that Linus socially engineered on G+ around that time. I didn't do any of the hash genealogy that you did here, and it now shows. The orginal patches are linked from 6942efc (xdiff: load full words in the inner loop of xdl_hash_record, 2012-04-06): https://lkml.org/lkml/2012/3/2/452 https://lkml.org/lkml/2012/3/5/6 The code still exists: https://github.com/torvalds/linux/blob/master/fs/namei.c#L1678 > I have implemented two simpler possibilities, both of which fix the > problems diffing the evil-icons repository: I think it would be best to separate three goals here: 1. hash function throughput 2. quality of the hash values 3. avoiding collision attacks XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly successful at that (6942efc (xdiff: load full words in the inner loop of xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p'). You are now addressing quality. I have no idea how you ran into this, but if you are reworking things already, I think it would be good to also randomize whatever hash you put in so as to give some measure of protection against collision attacks. > 1. An XDL_FAST_HASH implementation that matches the output of the DJB > hash exactly. Its performance is basically the same as DJB, because > the only thing is does differently is load 8 bytes at a time instead > of 1. It does all the same ALU operations as DJB. I don't think there's a point in having such a function, since it would mean a lot of code for no throughput gain. Let's just remove XDL_FAST_HASH and the original hashing scheme in favor of a better hash function. -- Thomas Rast tr@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html