On Mon, 16 Oct 2006, Jim Meyering wrote: > Linus Torvalds <torvalds@xxxxxxxx> wrote: > > On Mon, 16 Oct 2006, Linus Torvalds wrote: > ... > > So I think xdiff has chosen too small a hash. Can you try what happens if > > you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return > > a bigger value (for example, by initializing "bits" to 2 instead of 0), > > and see if that makes a difference. > > It makes no difference. > > Bear in mind that there are a *lot* of duplicate lines in the files > being compared: filtering each through "sort -u" removes 40-50k lines. It can't be due to duplicate lines. If the lines are truly duplicate, then they'd get the same 32-bit hash value, and then the first conditional in the expression would always be true, and then it wouldn't _matter_ if it's a "&&" or a "||". See? So as far as I can tell it has to be some kind of collission on the hash queue with _different_ hash values being queued on the same hash queue. Now, it could be that there's a bad hash algorithm somewhere (eg if XDL_HASHLONG() just does horribly badly in distributing the hash values onto the hash queues, you'd see this _regardless_ of how many bits you have, just because it clumps). Or there could be something else that I'm just missing.. It would probably be nice to just get a sampling of what the hash-queue looks like for the bad case? Maybe it would be obvious that certain different hash values then get the same XDL_HASHLONG() thing.. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html