On Fri, 7 Nov 2008, Davide Libenzi wrote: > On Fri, 7 Nov 2008, Linus Torvalds wrote: > > > On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote: > > > > > > If anyone's interested, the files are http://toroid.org/misc/1 and > > > http://toroid.org/misc/2 > > > > Btw, you can see this by just doing > > > > git diff 1 2 > > > > without even doing "git init" or doing any actual git repository. > > > > > Does anyone understand why this slowdown might happen or have > > > suggestions about where I should look for it? > > > > Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly > > higher), and it's triggered by the fact that almost all your lines are > > identical, ie you have a file that basically has 40,000 lines of each of > > > > xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx > > xx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx> > > xxxx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx> > > > > and 30,000 of > > > > * xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx} > > > > with a smattering of others. And this is a case where the internal git > > implementation does really badly. And nobody has really cared before, > > because nobody has ever had a case that mattered. > > > > There's a number of different 'diff' algorithms, and it looks like GNU > > diff has one that avoids the O(n^2) case for this case. > > > > I'm adding Davide as the original author of the diff library to the cc. > > I'm also adding Pierre, since he was talking about trying to implement > > another diff algorithm (although I'm not at all sure that the patience > > diff really would help this case at all). > > That should be an easy fix. Just need to limit the window by which > xdl_clean_mmatch() scans the current position. With +/- 100 lines (200 lines window): davide@alien:~$ time ./xdiff_test --diff 1 2 > /dev/null real 0m1.534s user 0m1.466s sys 0m0.040s - Davide -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html