Re: Resumable clone

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 08 Mar 2016 09:07:00 -0800

Kevin Wern <kevin.m.wern@xxxxxxxxx> writes:

> From what I understand, a pattern exists in clone to download a
> packfile when a desired object isn't found as a resource. In this
> case, if no alternative is listed in http-alternatives, the client
> automatically checks the pack index(es) to see which packfile contains
> the object it needs.

You sound as if you are describing how a fetch over the dumb commit
walker http transport works.  That does not have anything to do with
the discussion of resumable clone, though, so I am not sure where
you are going with this.

> What I believe *doesn't* exist is a
> way for the server to say, "I have a resource, in this case a
> full-history packfile, and I *prefer* you get that file instead of
> attempting to traverse the object tree." This should be implemented in
> a way that is extensible to other resource types moving forward.

Yes, that is very close to what I said in the "what remains?"
section, but with a crucial difference in a detail.  Perhaps reading
the message you are respoinding to again more carefully will clear
the confusion.  This is what we want to allow the server to say
(from the message you are responding to, but rephrased slightly,
hoping that it would help unconfuse you):

    I prefer not to serve a full clone to you in the usual route if
    I can avoid it.  You can help me by populate your history first
    with something else (which would bring you to a state as if you
    cloned potentially a bit older version of me) and then coming
    back to me for an additional fetch to complete the history.

That "something else" does not have to be, and is not expected to
be, the "full" history of the current state.  As long as it can be
used to bring the cloner to a reasonably recent state, sufficient to
make a follow up incremental fetch inexpesive enough, it is
appropriate.

> I'm not sure how the server should determine the returned resource. A
> packfile alone does not guarantee the full repo history, and I'm not
> positive checking the idx file for HEAD's commit hash ensures every
> sub-object is in that file (though I feel it should, because it is
> delta-compressed).

The above reasoning does not make much technical sense.  delta
compression does not ensure connectivity in the commit history and
commit->tree->blob containment.  Again I am not sure where you are
going with this.

> With that in mind, my best guess at the server
> logic for packfiles is something like:
>
> Do I have a full history packfile, and am I configured to return one?
> - If yes, then return an answer specifying the file url and type (packfile)
> - Otherwise, return some other answer indicating the client must go
> through the original cloning process (or possibly return a different
> kind of file and type, once we expand that capability)

Roughly speaking, yes.

> Which leaves me with questions on how to test the above condition. Is
> there an expected place, such as config, where the user will specify
> the type of alternate resource, and should we assume some default if
> it isn't specified? Can the user optionally specify the exact file to
> use (I can't see why because it only invites more errors)? Should the
> specification of this option change git's behavior on update, such as
> making sure the full history is compressed? Does the existence of the
> HEAD object in the packfile ensure the repo's entire history is
> contained in that file?

Those (except for your assumption that no follow-up fetch is
allowed, which requires you to limit yourself to "full" history,
which is an unnecessary requirement) are good points one should be
making design decisions on when building this part of the system.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html