(sorry for the repost, I use gmail and it send html mails by default). On 22 December 2014 at 11:48, Thomas Rast <tr@xxxxxxxxxxxxx> wrote: > > 1. hash function throughput > 2. quality of the hash values > 3. avoiding collision attacks > > XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly > successful at that (6942efc (xdiff: load full words in the inner loop of > xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p'). > > You are now addressing quality. > > I have no idea how you ran into this, but if you are reworking things > already, I think it would be good to also randomize whatever hash you > put in so as to give some measure of protection against collision > attacks. I assume you mean DJB2 when you say DJB, and if so I will just note that it is a pretty terrible hash function for arbitrary data. (I understand it does better with raw text.) It does not pass either strict-avalanche-criteria[1], nor does it pass the bit-independence-criteria[2]. I have images which show how badly DJB2 fails these tests if anyone is interested. Murmur3 is better, in that it does pass SAC and BIC, but before you decide to use Murmur3 you should review https://131002.net/siphash/and related resources which demonstrate multi-collision attacks on Murmur3 which are independent of the seed chosen. The paper also introduces a new hash function with good performance properties, and claims that it has cyptographic strength. (I say claims because I am not qualified to judge if it is or not.) Eg: https://131002.net/siphash/murmur3collisions-20120827.tar.gz I think if you want performance and robustness against collision attacks Siphash is a good candidate, as is perhaps the AES derived hash used by the Go folks, but the performance of that algorithm is strongly dependent on the CPU supporting AES primitives. Anyway, the point is that simply adding a random seed to a hash function like DJB2 or Murmur3 is not sufficient to prevent collision attacks. Yves [1] A change in a single bit of the seed or the key should result in 50% of the output bits of the hash changing. [2] output bits j and k should change independently when any single input bit i is inverted, for all i, j and k. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html