Re: [PATCH 1/2] git-commit: Disallow unchanged tree in non-merge mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 14 Sep 2007, Dmitry V. Levin wrote:
> 
> The situation when runstatus is too expensive is committing a large tree
> with massive changes.  More or less real life example is committing
> several openoffice.org unpacked tarballs.

Ok, I'm downloading those tar-balls to reproduce and hopefully see what's 
going on, but I'm not going to be able to get at it today.

I think that the "--no-status" is fine for scripting, and probably a good 
work-around for whatever is going on, but I suspect that this thing just 
triggers some unnecessary O(n^2) behaviour that we should just fix. Maybe 
it's too hard to fix, and the workaround ends up being the practical 
thing, but we've been pretty good at just fixing even the extreme odd 
cases like this, and I think it's worth trying to fix performance issues 
rather than just being able to say "that sucks, so let's avoid doing it".

> Second commit is slow:
> $ mkdir 2
> $ cd 2
> $ git init
> $ tar xf ../OOo_1.1.5_src.tar.gz
> $ mv OO* tree
> $ git add tree                          (about 50 seconds)
> $ git commit -m 1.1.5 >/dev/null        (about 20 seconds)
> $ git repack -a -d -q                   (about 2 minutes)
> $ du -h .git/objects/pack
> 202M    .git/objects/pack
> $ rm -rf tree
> $ rm -f .git/index
> $ tar xf ../OOo_2.0.4_src.tar.gz
> $ mv OO* tree
> $ git add tree                          (about 1.75 minutes)
> $ git commit -m 2.0.4 >/dev/null
> git-runstatus runs much longer (tens of minutes, I do not remember), and 
> finally fails, either with OOM error (if no sufficient virtual memory
> available; on x86-64 64G was not enough) or segfaults.

Ok, so it's not the number of files per se (since your test of just 
committing that tree *initially* worked fine), but it's somehow about the 
interaction with the previous commit.

I think Jeff is right that it might be some rename-detection issue. Rename 
detection in the face of total rewrites is indeed an O(n^2) or worse 
issue. I thought we already said "if everything changes, don't even 
bother", but it may be that we made that a gitweb special case or 
something..

Ahh, found it. "-l<num>", and yes, ony gitweb seems to use it, and in fact 
we don't even expose it to git-runstatus.

Oh, well. I'll look at it more closely tomorrow, but in the meantime, 
maybe you could try just this trivial test-patch (not meant as any kind of 
serious patch! Just making the default limit for rename detection go from 
"unlimited" to "don't bother if there's more than a hundred potential 
rename pairs").

		Linus
---
 diff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index 1aca5df..0ee9ea1 100644
--- a/diff.c
+++ b/diff.c
@@ -17,7 +17,7 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_rename_limit_default = -1;
+static int diff_rename_limit_default = 100;
 static int diff_use_color_default;
 int diff_auto_refresh_index = 1;
 


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux