Re: git-diff-tree inordinately (O(M*N)) slow on files with many changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 16 Oct 2006, Jim Meyering wrote:

> Linus Torvalds <torvalds@xxxxxxxx> wrote:
> > On Mon, 16 Oct 2006, Linus Torvalds wrote:
> ...
> > So I think xdiff has chosen too small a hash. Can you try what happens if
> > you change xdl_hashbits() (in xdiff/xutil.c) instead? Try making it return
> > a bigger value (for example, by initializing "bits" to 2 instead of 0),
> > and see if that makes a difference.
> 
> It makes no difference.
> 
> Bear in mind that there are a *lot* of duplicate lines in the files
> being compared: filtering each through "sort -u" removes 40-50k lines.

It can't be due to duplicate lines. If the lines are truly duplicate, then 
they'd get the same 32-bit hash value, and then the first conditional in 
the expression would always be true, and then it wouldn't _matter_ if it's 
a "&&" or a "||".

See?

So as far as I can tell it has to be some kind of collission on the hash 
queue with _different_ hash values being queued on the same hash queue.

Now, it could be that there's a bad hash algorithm somewhere (eg if 
XDL_HASHLONG() just does horribly badly in distributing the hash values 
onto the hash queues, you'd see this _regardless_ of how many bits you 
have, just because it clumps).

Or there could be something else that I'm just missing..

It would probably be nice to just get a sampling of what the hash-queue 
looks like for the bad case? Maybe it would be obvious that certain 
different hash values then get the same XDL_HASHLONG() thing..

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]