On Fri, 14 Sep 2007, Dmitry V. Levin wrote: > > The situation when runstatus is too expensive is committing a large tree > with massive changes. More or less real life example is committing > several openoffice.org unpacked tarballs. Ok, I'm downloading those tar-balls to reproduce and hopefully see what's going on, but I'm not going to be able to get at it today. I think that the "--no-status" is fine for scripting, and probably a good work-around for whatever is going on, but I suspect that this thing just triggers some unnecessary O(n^2) behaviour that we should just fix. Maybe it's too hard to fix, and the workaround ends up being the practical thing, but we've been pretty good at just fixing even the extreme odd cases like this, and I think it's worth trying to fix performance issues rather than just being able to say "that sucks, so let's avoid doing it". > Second commit is slow: > $ mkdir 2 > $ cd 2 > $ git init > $ tar xf ../OOo_1.1.5_src.tar.gz > $ mv OO* tree > $ git add tree (about 50 seconds) > $ git commit -m 1.1.5 >/dev/null (about 20 seconds) > $ git repack -a -d -q (about 2 minutes) > $ du -h .git/objects/pack > 202M .git/objects/pack > $ rm -rf tree > $ rm -f .git/index > $ tar xf ../OOo_2.0.4_src.tar.gz > $ mv OO* tree > $ git add tree (about 1.75 minutes) > $ git commit -m 2.0.4 >/dev/null > git-runstatus runs much longer (tens of minutes, I do not remember), and > finally fails, either with OOM error (if no sufficient virtual memory > available; on x86-64 64G was not enough) or segfaults. Ok, so it's not the number of files per se (since your test of just committing that tree *initially* worked fine), but it's somehow about the interaction with the previous commit. I think Jeff is right that it might be some rename-detection issue. Rename detection in the face of total rewrites is indeed an O(n^2) or worse issue. I thought we already said "if everything changes, don't even bother", but it may be that we made that a gitweb special case or something.. Ahh, found it. "-l<num>", and yes, ony gitweb seems to use it, and in fact we don't even expose it to git-runstatus. Oh, well. I'll look at it more closely tomorrow, but in the meantime, maybe you could try just this trivial test-patch (not meant as any kind of serious patch! Just making the default limit for rename detection go from "unlimited" to "don't bother if there's more than a hundred potential rename pairs"). Linus --- diff.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/diff.c b/diff.c index 1aca5df..0ee9ea1 100644 --- a/diff.c +++ b/diff.c @@ -17,7 +17,7 @@ #endif static int diff_detect_rename_default; -static int diff_rename_limit_default = -1; +static int diff_rename_limit_default = 100; static int diff_use_color_default; int diff_auto_refresh_index = 1; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html