Re: absurdly slow git-diff

Davide Libenzi <davidel@xxxxxxxxxxxxxxx> · Fri, 7 Nov 2008 15:18:51 -0800 (PST)

On Fri, 7 Nov 2008, Davide Libenzi wrote:

> On Fri, 7 Nov 2008, Linus Torvalds wrote:
> 
> > On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote:
> > >
> > > If anyone's interested, the files are http://toroid.org/misc/1 and
> > > http://toroid.org/misc/2
> > 
> > Btw, you can see this by just doing
> > 
> > 	git diff 1 2
> > 
> > without even doing "git init" or doing any actual git repository.
> > 
> > > Does anyone understand why this slowdown might happen or have
> > > suggestions about where I should look for it?
> > 
> > Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly 
> > higher), and it's triggered by the fact that almost all your lines are 
> > identical, ie you have a file that basically has 40,000 lines of each of
> > 
> > 	xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx
> > 	xx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx>
> > 	xxxx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx>
> > 
> > and 30,000 of 
> > 
> > 	* xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx}
> > 
> > with a smattering of others. And this is a case where the internal git 
> > implementation does really badly. And nobody has really cared before, 
> > because nobody has ever had a case that mattered.
> > 
> > There's a number of different 'diff' algorithms, and it looks like GNU 
> > diff has one that avoids the O(n^2) case for this case.
> > 
> > I'm adding Davide as the original author of the diff library to the cc. 
> > I'm also adding Pierre, since he was talking about trying to implement
> > another diff algorithm (although I'm not at all sure that the patience 
> > diff really would help this case at all).
> 
> That should be an easy fix. Just need to limit the window by which 
> xdl_clean_mmatch() scans the current position.

With +/- 100 lines (200 lines window):

davide@alien:~$ time ./xdiff_test --diff 1 2 > /dev/null 

real    0m1.534s
user    0m1.466s
sys     0m0.040s

- Davide

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html