Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 16 Oct 2006, Linus Torvalds wrote:
> 
> But it could certainly also be that you just broke the diffs entirely, so 
> I would like to wait for Davide to comment on your diff before Junio 
> should apply it. 

I think you broke it. 

If the "&& vs ||" makes a difference (and it clearly does), that implies 
that you have lots of different hash values on the same hash chain, and 
you end up considering those _different_ hash values to be all equivalent 
for the counting, even though they obviously aren't.

I think the real problem is that with big input, the hash tables are too 
small, making the hash chains too long - even though the values on the 
chains are different (ie we're not hashing different records with the same 
hash value over and over again - if that was true, the "&& vs ||" change 
wouldn't make any difference).

So I think xdiff has chosen too small a hash. Can you try what happens if 
you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return 
a bigger value (for example, by initializing "bits" to 2 instead of 0), 
and see if that makes a difference.

But again, I'm not actually all _that_ familiar with the libxdiff 
algorithms, _especially_ the line-based ones (I can follow the regular 
binary delta code, but the line-based one just makes my head hurt). So 
take anything I say with a pinch of salt.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]