On Sat, 8 Nov 2008, Abhijit Menon-Sen wrote: > > If anyone's interested, the files are http://toroid.org/misc/1 and > http://toroid.org/misc/2 Btw, you can see this by just doing git diff 1 2 without even doing "git init" or doing any actual git repository. > Does anyone understand why this slowdown might happen or have > suggestions about where I should look for it? Sure. It's actually fairly simple. You're hitting a O(n^2) thing (possibly higher), and it's triggered by the fact that almost all your lines are identical, ie you have a file that basically has 40,000 lines of each of xxxx: xxx, xx xxx xxxx xx:xx:xx +xxxx xx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx> xxxx: xxxx xxxxxxxxxxx <xxxx@xxxxxxxx> and 30,000 of * xxxxx xxxxx (xxxxxx.xxxx xxx xxxxx (\Xxxxxx) xxxxxx.xxxxxx {xxx} with a smattering of others. And this is a case where the internal git implementation does really badly. And nobody has really cared before, because nobody has ever had a case that mattered. There's a number of different 'diff' algorithms, and it looks like GNU diff has one that avoids the O(n^2) case for this case. I'm adding Davide as the original author of the diff library to the cc. I'm also adding Pierre, since he was talking about trying to implement another diff algorithm (although I'm not at all sure that the patience diff really would help this case at all). Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html