On Fri, 7 Nov 2008, Linus Torvalds wrote: > On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote: > > > > If anyone's interested, the files are http://toroid.org/misc/1 and > > http://toroid.org/misc/2 > > Btw, you can see this by just doing > > git diff 1 2 > > without even doing "git init" or doing any actual git repository. > > > Does anyone understand why this slowdown might happen or have > > suggestions about where I should look for it? > > Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly > higher), and it's triggered by the fact that almost all your lines are > identical, ie you have a file that basically has 40,000 lines of each of > > xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx > xx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx> > xxxx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx> > > and 30,000 of > > * xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx} > > with a smattering of others. And this is a case where the internal git > implementation does really badly. And nobody has really cared before, > because nobody has ever had a case that mattered. > > There's a number of different 'diff' algorithms, and it looks like GNU > diff has one that avoids the O(n^2) case for this case. > > I'm adding Davide as the original author of the diff library to the cc. > I'm also adding Pierre, since he was talking about trying to implement > another diff algorithm (although I'm not at all sure that the patience > diff really would help this case at all). That should be an easy fix. Just need to limit the window by which xdl_clean_mmatch() scans the current position. - Davide -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html