Re: Make `git fetch --all` parallel?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 11, 2016 at 3:37 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> So I do think it would be much faster, but I also think patches for this would
>> require some thought and a lot of refactoring of the fetch code.
>> ...
>> During the negotiation phase a client would have to be able to change its
>> mind (add more "haves", or in case of the parallel fetching these become
>> "will-have-soons", although the remote figured out the client did not have it
>> earlier.)
>
> Even though a fancy optimization as you outlined might be ideal, I
> suspect that users would be happier if the network bandwidth is
> utilized to talk to multiple remotes at the same time even if they
> end up receiving the same recent objects from more than one place in
> the end.

I agree. Though even for implementing the "dumb" case of fetching
objects twice we'd have to take care of some racing issues, I would assume.

Why did you put a "sleep 2" below?
* a slow start to better spread load locally? (keep the workstation responsive?)
* a slow start to have different fetches in a different phase of the
fetch protocol?
* avoiding some subtle race?

At the very least we would need a similar thing as Jeff recently sent for the
push case with objects quarantined and then made available in one go?

>
> Is the order in which "git fetch --all" iterates over "all remotes"
> predictable and documented?

it is predictable, as it is just the same order as put by grep in
$ grep "\[remote " .git/config, i.e. in order of the file, which in my
case turns out to be sorted by importance/history quite naturally.
But reordering my config file would be not a big deal.

I dunno, if documented though.

> If so, listing the remotes from more
> powerful and well connected place to slower ones and then doing an
> equivalent of stupid
>
>         for remote in $list_of_remotes_ordered_in_such_a_way

list_of_remotes_ordered_in_such_a_way is roughly:
$(git config --get-regexp remote.*.url | tr '.' ' ' |awk '{print $2}')

>         do
>                 git fetch "$remote" &
>                 sleep 2
>         done
>
> might be fairly easy thing to bring happiness.

I would love to see the implementation though, as over time I accumulate
a lot or remotes. (Someone published patches on the mailing list and made
them available somewhere hosted? Grabbing them from their hosting site
is easier than applying patches for me, so I'd rather fetch them... so I have
some remotes now)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]