Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 16 Oct 2006, Linus Torvalds wrote:

> On Mon, 16 Oct 2006, Linus Torvalds wrote:
> > 
> > But it could certainly also be that you just broke the diffs entirely, so 
> > I would like to wait for Davide to comment on your diff before Junio 
> > should apply it. 
> 
> I think you broke it. 
> 
> If the "&& vs ||" makes a difference (and it clearly does), that implies 
> that you have lots of different hash values on the same hash chain, and 
> you end up considering those _different_ hash values to be all equivalent 
> for the counting, even though they obviously aren't.
> 
> I think the real problem is that with big input, the hash tables are too 
> small, making the hash chains too long - even though the values on the 
> chains are different (ie we're not hashing different records with the same 
> hash value over and over again - if that was true, the "&& vs ||" change 
> wouldn't make any difference).
> 
> So I think xdiff has chosen too small a hash. Can you try what happens if 
> you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return 
> a bigger value (for example, by initializing "bits" to 2 instead of 0), 
> and see if that makes a difference.

I think the xdl_hashbits() picks up the hash table size "almost" 
correctly. I think we're looking at some bad hash *collisions* (not 
records with same hash value, that'd be stopped by the mlim check). 
Send me the files and I'll take a look ...




> But again, I'm not actually all _that_ familiar with the libxdiff 
> algorithms, _especially_ the line-based ones (I can follow the regular 
> binary delta code, but the line-based one just makes my head hurt). So 
> take anything I say with a pinch of salt.

That's my revenge on myself having to follow your code in the kernel  :D




- Davide


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]