Re: git-fetch per-repository speed issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 3 Jul 2006, Keith Packard wrote:
> 
> 5 Start:                             21:59:01.584648000
> 66 After args:                       21:59:01.605987000
> 248 fetch_main() start:              21:59:02.408559000
> 339 fetch_main() before fetch-pack:  21:59:03.293228000
> 387 fetch_main() done:               21:59:04.784388000
> 422 After tag following:             21:59:05.311439000
> 438 All done:                        21:59:05.315338000
> 
> fetch-pack itself took 0.421 seconds (measured with time(1)).
> 
> Looks like the bulk of the time here is caused by simple shell
> processing overhead, some of which scales with the number of heads and
> tags to track.

Ahh.. Do you have tons of tags at the other end?

Looking closer, I suspect a big part of it is that

	git-ls-remote $upload_pack --tags "$remote" |
	sed -ne 's|^\([0-9a-f]*\)[      ]\(refs/tags/.*\)^{}$|\1 \2|p' |
	while read sha1 name
	do
		..
	done

loop.

With a lot of tags, the shell overhead there can indeed be pretty 
disgusting. And I was wrong - I thought it would do that git-ls-remote 
only if the first time around we noticed that we would need to, but we do 
actually do it all the time that we're fetching any new branches. 

The sad part is that we really already got the list once, we just never 
saved it away (ie "git-fetch-pack" actually _knows_ what the tags at the 
other end are, and also knows which tags we already have, so if we made 
git-fetch-pack just create that list and save it off, all the overhead 
would just go away).

And yes, the shell script loops are really really simple, but some of them 
are actually quadratic in the number of refs (O(local*remote)). If this 
was a C program, we'd never even care, but with shell, the thing is slow 
enough that having even a modest amount of tags and refs is going to just 
make it waste a lot of time in shell scripting.

We already do a lot of the infrastructure for "git fetch" in C - the 
remotes parsing etc is all things that "git fetch" used to share with "git 
push", but "git push" has been a builtin C program for a while now. I 
suspect we should just do the same to "git fetch", which would make all 
these issues just totally go away.

			Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]