Re: Make `git fetch --all` parallel?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 11, 2016 at 6:52 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Tue, Oct 11, 2016 at 09:34:28PM -0400, Jeff King wrote:
>
>> > Ok, time to present data... Let's assume a degenerate case first:
>> > "up-to-date with all remotes" because that is easy to reproduce.
>> >
>> > I have 14 remotes currently:
>> >
>> > $ time git fetch --all
>> > real 0m18.016s
>> > user 0m2.027s
>> > sys 0m1.235s
>> >
>> > $ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
>> > -P 14 -I % git fetch %
>> > real 0m5.168s
>> > user 0m2.312s
>> > sys 0m1.167s
>>
>> So first, thank you (and Ævar) for providing real numbers. It's clear
>> that I was talking nonsense.
>>
>> Second, I wonder where all that time is going. Clearly there's an
>> end-to-end latency issue, but I'm not sure where it is. Is it startup
>> time for git-fetch? Is it in getting and processing the ref
>> advertisement from the other side? What I'm wondering is if there are
>> opportunities to speed up the serial case (but nobody really cared
>> before because it doesn't matter unless you're doing 14 of them back to
>> back).
>
> Hmm. I think it really might be just network latency. Here's my fetch
> time:
>
>   $ git config remote.origin.url
>   git://github.com/gitster/git.git
>
>   $ time git fetch origin
>   real    0m0.183s
>   user    0m0.072s
>   sys     0m0.008s
>
> 14 of those in a row shouldn't take more than about 2.5 seconds, which
> is still twice as fast as your parallel case. So what's going on?
>
> One is that I live about a hundred miles from GitHub's data center, and
> my ping time there is ~13ms. The other side of the country, let alone
> Europe, is going to be noticeably slower just for the TCP handshake.
>
> The second is that git:// is really cheap and simple. git-over-ssh is
> over twice as slow:
>
>   $ time git fetch git@xxxxxxxxxx:gitster/git
>   ...
>   real    0m0.432s
>   user    0m0.100s
>   sys     0m0.032s
>
> HTTP fares better than I would have thought, but is also slower:
>
>   $ time git fetch https://github.com/gitster/git
>   ...
>   real    0m0.258s
>   user    0m0.080s
>   sys     0m0.032s
>
> -Peff

Well 9/14 are https for me, the rest is git://
Also 9/14 (but a different set) is github, the rest is
either internal or kernel.org.

Fetching from github (https) is only 0.9s from here
(SF bay area, I'm not in Europe any more ;) )

I would have expected to have a speedup
of roughly 2 + latency gains. Factor 2 because
in the current state of affairs either the client or the
remote is working, i.e. the other sie is idle/waiting, so
factor 2 seemed reasonable (and ofc the latency), so I
was a bit surprised to see a higher yield.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]