On 05/22/2012 07:33 PM, Jeff King wrote:
On Tue, May 22, 2012 at 03:28:32PM +0200, Michael Haggerty wrote:
When I try it with both 'next' and v1.7.10, I see that the latter is
much faster. I did my tests with a warm cache, but the interesting
number is the CPU time, which is quite different.
I cannot reproduce anything as big as the performance regression that
you see. I did find a regression 9.5 s -> 10.1 s caused by
5fa044184 find_containing_dir(): use strbuf in implementation of this
function
It is fixed by the patch that I just sent to the mailing list [1],
which sizes the strbuf in that function to strlen(refname) instead of
PATH_MAX. Since your experiments suggest that the performance
regression is related to the size of the repository contents, it
could be that the same test produces more memory pressure on your
system and therefore a larger effect. Please try the patch and tell
me if it fixes the problem for you.
That patch drops about a second off of the slow case, but the main
problem still remains. Just to be sure we are both doing the exact same
thing, here is a complete reproduction recipe:
GIT=/path/to/your/git/checkout
RAILS=/path/to/unpacked/rails.git
cd $GIT&&
git checkout 432ad41e60cedb87ceec446ab034d46a53f5f9d8^&&
make&&
cd $RAILS&&
time $GIT/bin-wrappers/git fetch . refs/*:refs/*&&
cd $GIT&&
git checkout 432ad41e60cedb87ceec446ab034d46a53f5f9d8&&
make&&
cd $RAILS&&
time $GIT/bin-wrappers/git fetch . refs/*:refs/*
produces:
[before]
real 0m9.128s
user 0m9.369s
sys 0m0.976s
[after]
real 0m15.926s
user 0m16.181s
sys 0m0.984s
I don't think memory pressure is involved. The git process maxes out at
~300M, and this machine has 7.5G available.
I wonder why we are getting different results. Could it be compilation
options? As I mentioned, I compile with -O0, but I got similar results
with -O3.
Thanks for the idiot-proof reproduction steps. I don't know what I was
doing differently last time, but now I can reproduce your slowdown.
The thing that triggers the problem is a reference directory
("refs/remotes/8514/pull/") with 3810 sub-namespaces
("refs/remotes/8514/pull/1091", "refs/remotes/8514/pull/1092", etc),
each subnamespace containing only one or two references. It wouldn't be
a problem having so many *references* in a namespace, because references
are just added to the end of the directory unsorted, and typically only
sorted once after all of them have been added. But every time that a
sub-namespace is accessed, the namespace has to be searched to see if
that sub-namespace already exists. The searching requires the namespace
to be sorted. So, for example, when adding the following sequence of
references:
1. refs/remotes/8514/pull/1091/head
2. refs/remotes/8514/pull/1091/merge
3. refs/remotes/8514/pull/1092/head
4. refs/remotes/8514/pull/1092/merge
5. refs/remotes/8514/pull/1093/head
6. refs/remotes/8514/pull/1093/merge
At step 1, sub-namespace "refs/remotes/8514/pull/1091/" is added to
"refs/remotes/8514/pull/". This makes the code think that
"refs/remotes/8514/pull/" is unsorted.
Step 2 is not problematic; the new references is added to
"refs/remotes/8514/pull/1091/", but adding a reference doesn't require
the ref_dir to be sorted.
At step 3, namespace "refs/remotes/8514/pull/" is first checked to see
if sub-namespace "refs/remotes/8514/pull/1092/" already exists. This
search requires namespace "refs/remotes/8514/pull/" to be sorted because
step 1 caused it to be considered unsorted.
Again at step 5, namespace "refs/remotes/8514/pull/" needs to be sorted,
and so on every time a subnamespace is added.
I will submit a patch shortly.
Michael
--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html