Re: absurdly slow git-diff

Davide Libenzi <davidel@xxxxxxxxxxxxxxx> · Fri, 7 Nov 2008 15:04:22 -0800 (PST)

On Fri, 7 Nov 2008, Linus Torvalds wrote:

> On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote:
> >
> > If anyone's interested, the files are http://toroid.org/misc/1 and
> > http://toroid.org/misc/2
> 
> Btw, you can see this by just doing
> 
> 	git diff 1 2
> 
> without even doing "git init" or doing any actual git repository.
> 
> > Does anyone understand why this slowdown might happen or have
> > suggestions about where I should look for it?
> 
> Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly 
> higher), and it's triggered by the fact that almost all your lines are 
> identical, ie you have a file that basically has 40,000 lines of each of
> 
> 	xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx
> 	xx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx>
> 	xxxx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx>
> 
> and 30,000 of 
> 
> 	* xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx}
> 
> with a smattering of others. And this is a case where the internal git 
> implementation does really badly. And nobody has really cared before, 
> because nobody has ever had a case that mattered.
> 
> There's a number of different 'diff' algorithms, and it looks like GNU 
> diff has one that avoids the O(n^2) case for this case.
> 
> I'm adding Davide as the original author of the diff library to the cc. 
> I'm also adding Pierre, since he was talking about trying to implement
> another diff algorithm (although I'm not at all sure that the patience 
> diff really would help this case at all).

That should be an easy fix. Just need to limit the window by which 
xdl_clean_mmatch() scans the current position.

- Davide

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html