Stefan Beller wrote: > On Wed, Feb 10, 2016 at 12:11 PM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: >> Several of us at $DAY_JOB talked about this more today and thought a >> variation makes more sense: >> >> 1. Clients attempting clone ask for /info/refs?service=git-upload-pack >> like they do today. >> >> 2. Servers that support resumable clone include a "resumable" >> capability in the advertisement. > > like "resumable-token=hash" similar to a push cert advertisement? It could just be the string 'resumable'. But I wonder if it would be possible to save a round-trip by getting the 302 response in the initial request. If the client requests /info/refs?service=git-upload-pack&want_resumable=true then allow the server to make a 302 in response to its current mostly whole pack. Current clients would never send such a request because the current protocol requires that for smart clients The request MUST contain exactly one query parameter, `service=$servicename`, where `$servicename` MUST be the service name the client wishes to contact to complete the operation. The request MUST NOT contain additional query parameters. Current http-backend ignores extra query parameters. I haven't checked other smart http server implementations, though. >> 3. Updated clients on clone request GET /info/refs?service=git-resumable-clone. > > Or just in the non-http case, they would terminate after the ls-remote > (including capability advertisement) was done and connect again to > a different service such as git-upload-stale-pack with the resumable > token to identify the pack. HTTP supports range requests and existing CDNs speak HTTP, so I suspect it would work better if the git-resumable-clone service printed an HTTP URL from which to grab the packfile. I think the details are something that could be figured out after trying out the idea with http first, though. [...] >> 5. Clients fetch the file using standard HTTP GET, possibly with >> byte-ranges to resume. > > In the non-http case the git-upload-stale-pack would be rsync with the > resume token to determine the file name of the pack, > such that we have resumeability. How do I tunnel rsync over git protocol? So I think in the non-http case the git-resumable-clone service would have to print a URL to be served using a possibly different protocol (e.g., a signed https URL for getting the file from a service like S3, or an rsync URL for getting the file using the same ssh creds that were used for the initial request). [...] >> 6. Once stored and indexed with .idx, clients run `git fsck >> --lost-found` to discover the roots of the pack it downloaded. These >> are saved as temporary references. > > jrn: > > I suspect we can do even faster by making index-pack do the work > > index-pack --check-self-contained-and-connected --strict + --check-self-contained-and-connected check that the pack is self-contained. In the process they mark each object that is reachable from another object in the pack with FLAG_LINK. The objects not marked with FLAG_LINK are the roots. [...] >> To make step 4 really resume well, clients may need to save the first >> Location header it gets back from >> /info/refs?service=git-resumable-clone and use that on resume. Servers >> are likely to embed the pack SHA-1 in the Location header, and the >> client wants to use this on subsequent GET attempts to abort early if >> the server has deleted the pack the client is trying to obtain. Yes. I really like this design. I'm tempted to implement it (since it lacks a bunch of the downsides of clone.bundle). Thanks, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html