Re: Make `git fetch --all` parallel?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 11, 2016 at 3:59 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Tue, Oct 11, 2016 at 03:50:36PM -0700, Stefan Beller wrote:
>
>> I agree. Though even for implementing the "dumb" case of fetching
>> objects twice we'd have to take care of some racing issues, I would assume.
>>
>> Why did you put a "sleep 2" below?
>> * a slow start to better spread load locally? (keep the workstation responsive?)
>> * a slow start to have different fetches in a different phase of the
>> fetch protocol?
>> * avoiding some subtle race?
>>
>> At the very least we would need a similar thing as Jeff recently sent for the
>> push case with objects quarantined and then made available in one go?
>
> I don't think so. The object database is perfectly happy with multiple
> simultaneous writers, and nothing impacts the have/wants until actual
> refs are written. Quarantining objects before the refs are written is an
> orthogonal concept.

If a remote advertises its tips, we'd need to look these up (clientside) to
decide if we have them, and I do not think we'd do that via a reachability
check, but via direct lookup in the object data base? So I do not quite
understand, what we gain from the atomic ref writes in e.g. remote/origin/.


> I'm not altogether convinced that parallel fetch would be that much
> faster, though.

Ok, time to present data... Let's assume a degenerate case first:
"up-to-date with all remotes" because that is easy to reproduce.

I have 14 remotes currently:

$ time git fetch --all
real 0m18.016s
user 0m2.027s
sys 0m1.235s

$ time git config --get-regexp remote.*.url |awk '{print $2}' |xargs
-P 14 -I % git fetch %
real 0m5.168s
user 0m2.312s
sys 0m1.167s

A factor of >3, so I suspect there is improvement ;)

Well just as Ævar pointed out, there is some improvement.

>
> I usually just do a one-off fetch of their URL in such a case, exactly
> because I _don't_ want to end up with a bunch of remotes. You can also
> mark them with skipDefaultUpdate if you only care about them
> occasionally (so you can "git fetch sbeller" when you care about it, but
> it doesn't slow down your daily "git fetch").

And I assume you don't want the remotes because it takes time to fetch and not
because your disk space is expensive. ;)

>
> -Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]