Re: Resumable clone

Kevin Wern <kevin.m.wern@xxxxxxxxx> · Mon, 7 Mar 2016 19:33:40 -0800

Hey Junio and Duy,

Thank you for your thorough responses! I'm new to git dev, so it's
extremely helpful.

> - The server side endpoint does not have to be, and I think it
> should not be, implemented as an extension to the current
> upload-pack protocol. It is perfectly fine to add a new "git
> prime-clone" program next to existing "git upload-pack" and
> "git receive-pack" programs and drive it through the
> git-daemon, curl remote helper, and direct execution over ssh.

I'd like to work on this, and continue through to implementing the
prime_clone() client-side function.

>From what I understand, a pattern exists in clone to download a
packfile when a desired object isn't found as a resource. In this
case, if no alternative is listed in http-alternatives, the client
automatically checks the pack index(es) to see which packfile contains
the object it needs.

However, the above is a fallback. What I believe *doesn't* exist is a
way for the server to say, "I have a resource, in this case a
full-history packfile, and I *prefer* you get that file instead of
attempting to traverse the object tree." This should be implemented in
a way that is extensible to other resource types moving forward.

I'm not sure how the server should determine the returned resource. A
packfile alone does not guarantee the full repo history, and I'm not
positive checking the idx file for HEAD's commit hash ensures every
sub-object is in that file (though I feel it should, because it is
delta-compressed). With that in mind, my best guess at the server
logic for packfiles is something like:

Do I have a full history packfile, and am I configured to return one?
- If yes, then return an answer specifying the file url and type (packfile)
- Otherwise, return some other answer indicating the client must go
through the original cloning process (or possibly return a different
kind of file and type, once we expand that capability)

Which leaves me with questions on how to test the above condition. Is
there an expected place, such as config, where the user will specify
the type of alternate resource, and should we assume some default if
it isn't specified? Can the user optionally specify the exact file to
use (I can't see why because it only invites more errors)? Should the
specification of this option change git's behavior on update, such as
making sure the full history is compressed? Does the existence of the
HEAD object in the packfile ensure the repo's entire history is
contained in that file?

Also, for now I'm assuming the same options should be available for
prime-clone as are available for upload-pack (--strict,
--timeout=<n>). Let me know if any other features are necessary.
Also, let me know if I'm headed in the complete wrong direction...

Thank you so much for your help!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html