On Mon, 16 Oct 2006, Linus Torvalds wrote: > On Mon, 16 Oct 2006, Linus Torvalds wrote: > > > > But it could certainly also be that you just broke the diffs entirely, so > > I would like to wait for Davide to comment on your diff before Junio > > should apply it. > > I think you broke it. > > If the "&& vs ||" makes a difference (and it clearly does), that implies > that you have lots of different hash values on the same hash chain, and > you end up considering those _different_ hash values to be all equivalent > for the counting, even though they obviously aren't. > > I think the real problem is that with big input, the hash tables are too > small, making the hash chains too long - even though the values on the > chains are different (ie we're not hashing different records with the same > hash value over and over again - if that was true, the "&& vs ||" change > wouldn't make any difference). > > So I think xdiff has chosen too small a hash. Can you try what happens if > you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return > a bigger value (for example, by initializing "bits" to 2 instead of 0), > and see if that makes a difference. I think the xdl_hashbits() picks up the hash table size "almost" correctly. I think we're looking at some bad hash *collisions* (not records with same hash value, that'd be stopped by the mlim check). Send me the files and I'll take a look ... > But again, I'm not actually all _that_ familiar with the libxdiff > algorithms, _especially_ the line-based ones (I can follow the regular > binary delta code, but the line-based one just makes my head hurt). So > take anything I say with a pinch of salt. That's my revenge on myself having to follow your code in the kernel :D - Davide - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html