Re: Multi-threaded 'git clone'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 16, 2015 at 10:43 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Jeff King <peff@xxxxxxxx> writes:
>
>> ... And the whole output is checksummed by a single sha1
>> over the whole stream that comes at the end.
>>
>> I think the most feasible thing would be to quickly spool it to a
>> server on the LAN, and then use an existing fetch-in-parallel tool
>> to grab it from there over the WAN.
>
> One possibility would be for the server to prepare a static bundle
> file to bootstrap all the "clone" clients with and publish it on a
> CDN.  A protocol extension would tell the client where to download
> the bundle from, the client can then grab the bundle to clone from
> it to become "slightly stale but mostly up to date", and then do a
> usual incremental update with the server after that to be complete.
>
> The server would update the bundle used to bootstrap clone clients
> periodically in order to keep the incrementals to the minimum, and
> would make sure their bitmap is anchored at the tips of bundles to
> minimize the object counting load during the incremental phase.
>
> I think "repo" used by folks who do AOSP does something similar to
> that by scripting around "git clone".  I'd imagine that they would
> be happy to see if "git clone" did all that inside.

Yes, the "repo" tool used by Android uses curl to download a
previously cached $URL/clone.bundle using resumable HTTP. For Android
the file is only updated ~every 6 months at major releases and is
easily cached by CDNs and HTTP proxy servers.

This is spooled to a temporary file on disk then unpacked using `git
fetch $path/clone.bundle refs/heads/*:refs/remotes/origin/*`.
Afterwards a normal git fetch is run to bring the new clone current
with the server, picking up any delta that happened since the bundle
was created and cached.

The Android Git servers at android.googlesource.com just recognize
*/clone.bundle GET requests and issue 302 redirects to the CDN farm
that actually stores and serves the precreated bundle files.

We really want to see this in stock git clone for HTTP transports, as
other projects like Chromium want to use it for their ~3 GiB
repository. Being able to build the bulk of the repo every few months
and serve it out using a CDN to bootstrap new clients would really
help developers on slower or flaky network connections.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]