Re: Slow fetches of tags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ralf Baechle <ralf@xxxxxxxxxxxxxx> writes:

> On Wed, May 24, 2006 at 04:43:02PM -0700, Linus Torvalds wrote:
>
>> Actually, maybe the problem is that Ralf's tree has two roots, because of 
>> the old CVS history. It might be following the other root down for the 
>> "have" part, since that one doesn't exist at all in the target and the 
>> other side will never acknowledge any of it. 
>> 
>> I'll play with it.
>
> Interesting idea, so I went to play with it, too.  I took a copy of the
> tree and deleted all branches except the v2.6.16-stable tracking branch
> which I pruned back to v2.6.16.17, then added a new branch starting at
> the oldest commit, your initial import of the kernel tree:

I've been looking at this issue again...

> $ git branch junk 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
> $ git checkout junk
> $ seq -f "%05.0f" 1 100 | while read i; do echo $i; echo $i > Makefile;\
>   git commit -s -m "Blah $i" Makefile; done
>
> So with this I get:
>
> $ git branch
> * junk
>   v2.6.16-stable
> $
>
> If I now run
>
> $ strace git-fetch-pack --thin git://www.kernel.org/pub/scm/linux/kernel/\
> 	git/stable/linux-2.6.16.y.git \
> 	refs/heads/master refs/tags/v2.6.16.18 2>&1 | grep have /tmp/xxx
>
> I get:

... 100 newest commits from the junk branch and then all the
    tags the downloader has are sent as "have"s.

Now, sending the newest commits before sending the tags is
unavoidable, since the other end does not know where you forked
at (the purpose of the handshake is to find out where to begin
with).  But as soon as you send v2.6.16.17 (the latest tag that
you have in common with the other side, _and_ is a proper
ancestor of what you want -- v2.6.16.18 but that fact you do not
know yet), the server end should be able to say "ok, we know
enough".  That is not happening.

A few hints for debugging this:

 * local test is easier -- fetch-pack spawns upload-pack using
   PATH and GIT_EXEC_PATH so set them to point at the updated
   upload-pack being tested.

 * Passing the standard error from "fetch-pack -v" to "name-rev
   --stdin" makes it a bit more pleasant to see what is going on.

With the attached patch, the server side tells the client to
stop immediately after it says it has the commit tagged as
v2.6.16.17 while asking for v2.6.16.18.  With your "100 commits
on junk" repository, it does not make much of a difference,
though.  The reasons are (1) the 100 commits on "junk" are much
younger than any of the tags, so they are sent anyway, (2) we
have a 32-commit window, and keep one window in flight to make
the protocol stream, which means there will be max 64 "have"
that are in flight unacked, and a clone of linux-2.6.16.y
repository that has up to v2.6.16.17 tag has only 52 tags.

So we end up sending all the tags anyway in this particular
case.

I've thought about sending tags and only _tips_ of branches
first, but I think that would have a grave performance impact on
more normal cases.  If you are dealing with a remote repository
with a bunch of tags, your "master" is ahead of the remote
repository, and you do not use tracking branch to track the
remote (pretend you are Linus and pulling from a subsystem
maintainer), then you obviously do not want to send v2.6.12-rc2
tag before you send commits from your "master" branch to get to
where your subsystem maintainer forked from you (otherwise the
remote side would say "I do not know your 'master' commit, but
now we know we have this ancient v2.6.12-rc2 in common, so let's
have a pack between that and the tip of the subsystem tree"), so
I do think sending "100 commits on junk branch" is unavoidable.

I think the attached patch is safe in general, but somebody may
want to give an extra set of eyeballs to double check the logic
is sane.

-- >8 --
upload-pack: squelch downloader more aggressively under multi-ack

When the server side sees "have" line that makes all the "want"
commits somehow reachable from one of the "have" lines so far,
stop responding "continue" to prevent the other end going down
to send too many refs.

---
diff --git a/upload-pack.c b/upload-pack.c
index 617ee46..ac42d0d 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -452,8 +452,13 @@ static int get_common_commits(void)
 			default:
 				memcpy(hex, sha1_to_hex(sha1), 41);
 				if (multi_ack) {
-					const char *msg = "ACK %s continue\n";
-					packet_write(1, msg, hex);
+					const char *msg = "ACK %s%s\n";
+					const char *cont = " continue";
+					if (ok_to_give_up()) {
+						cont = "";
+						multi_ack = 0;
+					}
+					packet_write(1, msg, hex, cont);
 					memcpy(last_hex, hex, 41);
 				}
 				else if (have_obj.nr == 1)

-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]