Re: Make `git fetch --all` parallel?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 11, 2016 at 03:50:36PM -0700, Stefan Beller wrote:

> I agree. Though even for implementing the "dumb" case of fetching
> objects twice we'd have to take care of some racing issues, I would assume.
> 
> Why did you put a "sleep 2" below?
> * a slow start to better spread load locally? (keep the workstation responsive?)
> * a slow start to have different fetches in a different phase of the
> fetch protocol?
> * avoiding some subtle race?
> 
> At the very least we would need a similar thing as Jeff recently sent for the
> push case with objects quarantined and then made available in one go?

I don't think so. The object database is perfectly happy with multiple
simultaneous writers, and nothing impacts the have/wants until actual
refs are written. Quarantining objects before the refs are written is an
orthogonal concept.

I'm not altogether convinced that parallel fetch would be that much
faster, though. Your bottleneck for a fetch is generally the network for
most of the time, then a brief spike of CPU during delta resolution. You
might get some small benefit from overlapping the fetches so that you
spend CPU on one while you spend network on the other, but I doubt it
would be nearly as beneficial as the parallel submodule clones (which
generally have a bigger CPU segment, and also are generally considered
independent, so there's no real tradeoff of getting duplicate objects).

Sometimes the bottleneck is the server preparing the back, but if that
is the case, you should probably complain to your server admin to enable
bitmaps. :)

> I would love to see the implementation though, as over time I accumulate
> a lot or remotes. (Someone published patches on the mailing list and made
> them available somewhere hosted? Grabbing them from their hosting site
> is easier than applying patches for me, so I'd rather fetch them... so I have
> some remotes now)

I usually just do a one-off fetch of their URL in such a case, exactly
because I _don't_ want to end up with a bunch of remotes. You can also
mark them with skipDefaultUpdate if you only care about them
occasionally (so you can "git fetch sbeller" when you care about it, but
it doesn't slow down your daily "git fetch").

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]