Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stefan Beller wrote:
> On Wed, Feb 10, 2016 at 12:11 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:

>> Several of us at $DAY_JOB talked about this more today and thought a
>> variation makes more sense:
>>
>> 1. Clients attempting clone ask for /info/refs?service=git-upload-pack
>> like they do today.
>>
>> 2. Servers that support resumable clone include a "resumable"
>> capability in the advertisement.
>
> like "resumable-token=hash" similar to a push cert advertisement?

It could just be the string 'resumable'.

But I wonder if it would be possible to save a round-trip by getting the
302 response in the initial request.  If the client requests

	/info/refs?service=git-upload-pack&want_resumable=true

then allow the server to make a 302 in response to its current mostly
whole pack.  Current clients would never send such a request because the
current protocol requires that for smart clients

	The request MUST contain exactly one query parameter,
	`service=$servicename`, where `$servicename` MUST be the service
	name the client wishes to contact to complete the operation.
	The request MUST NOT contain additional query parameters.

Current http-backend ignores extra query parameters.  I haven't
checked other smart http server implementations, though.

>> 3. Updated clients on clone request GET /info/refs?service=git-resumable-clone.
>
> Or just in the non-http case, they would terminate after the ls-remote
> (including capability advertisement) was done and connect again to
> a different service such as git-upload-stale-pack with the resumable
> token to identify the pack.

HTTP supports range requests and existing CDNs speak HTTP, so I
suspect it would work better if the git-resumable-clone service
printed an HTTP URL from which to grab the packfile.

I think the details are something that could be figured out after
trying out the idea with http first, though.

[...]
>> 5. Clients fetch the file using standard HTTP GET, possibly with
>> byte-ranges to resume.
>
> In the non-http case the git-upload-stale-pack would be rsync with the
> resume token to determine the file name of the pack,
> such that we have resumeability.

How do I tunnel rsync over git protocol?

So I think in the non-http case the git-resumable-clone service would
have to print a URL to be served using a possibly different protocol
(e.g., a signed https URL for getting the file from a service like S3,
or an rsync URL for getting the file using the same ssh creds that
were used for the initial request).

[...]
>> 6. Once stored and indexed with .idx, clients run `git fsck
>> --lost-found` to discover the roots of the pack it downloaded. These
>> are saved as temporary references.
>
> jrn:
> > I suspect we can do even faster by making index-pack do the work
>
>     index-pack --check-self-contained-and-connected

--strict + --check-self-contained-and-connected check that the pack
is self-contained.  In the process they mark each object that is
reachable from another object in the pack with FLAG_LINK.

The objects not marked with FLAG_LINK are the roots.

[...]
>> To make step 4 really resume well, clients may need to save the first
>> Location header it gets back from
>> /info/refs?service=git-resumable-clone and use that on resume. Servers
>> are likely to embed the pack SHA-1 in the Location header, and the
>> client wants to use this on subsequent GET attempts to abort early if
>> the server has deleted the pack the client is trying to obtain.

Yes.

I really like this design.  I'm tempted to implement it (since it
lacks a bunch of the downsides of clone.bundle).

Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]