On Mon, Oct 30, 2017 at 10:59:41AM -0700, Jeff King wrote: > > There's also https://www.strchr.com/hash_functions, which lists DJB2 > > as Bernstein. Both functions rank somewhere in the middle of that list. > > FWIW, I did some experiments with Murmur3 and SipHash a while back, but > I don't think I came up with anything faster than the existing code. > OTOH, moving to SipHash gives us the ability to randomize the hashes, > which can resist some DoS attacks (although as I've said before, > computing arbitrary diffs for untrusted strangers is pretty much a > DoS-in-a-box). By the way, one of the things that complicates plugging new functions into xdiff's hashing is that xdl_hash_record() simultaneously computes the hash _and_ finds the end-of-line marker. So the "siphash is only 10% slower" number I showed came with quite a few contortions to do both. Here it is compared to a more naive application of the siphash code (i.e., memchr to find end-of-line, and then feed the resulting bytes to siphash): Test origin HEAD^ jk/xdl-siphash-wip ------------------------------------------------------------------------------------------------------- 4000.1: log -3000 (baseline) 0.05(0.05+0.00) 0.05(0.05+0.00) +0.0% 0.05(0.05+0.00) +0.0% 4000.2: log --raw -3000 (tree-only) 0.31(0.27+0.03) 0.31(0.27+0.03) +0.0% 0.31(0.27+0.03) +0.0% 4000.3: log -p -3000 (Myers) 2.06(2.01+0.05) 2.30(2.21+0.09) +11.7% 2.96(2.91+0.04) +43.7% 4000.4: log -p -3000 --histogram 2.44(2.38+0.06) 2.67(2.60+0.07) +9.4% 3.32(3.26+0.06) +36.1% 4000.5: log -p -3000 --patience 2.57(2.47+0.09) 2.90(2.82+0.08) +12.8% 3.48(3.43+0.05) +35.4% There "origin" is the existing djb hash, "HEAD^" is the complicated "fast" siphash (which I very well may have screwed up), and the final is the more naive version, which is quite a bit slower. -Peff