Re: [PATCH 00/11] Resumable clone

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 28 Sep 2016 11:22:04 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
> What "git clone" should have been was:
>
>     * Parse command line arguments;
>
>     * Create a new repository and go into it; this step would
>       require us to have parsed the command line for --template,
>       <directory>, --separate-git-dir, etc.
>
>     * Talk to the remote and do get_remote_heads() aka ls-remote
>       output;
>
>     * Decide what fetch refspec to use, which alternate object store
>       to borrow from; this step would require us to have parsed the
>       command line for --reference, --mirror, --origin, etc;
>
>     --- we'll insert something new here ---
>
>     * Issue "git fetch" with the refspec determined above; this step
>       would require us to have parsed the command line for --depth, etc.
>
>     * Run "git checkout -b" to create an initial checkout; this step
>       would require us to have parsed the command line for --branch,
>       etc.
>
> Even though the current code conceptually does the above, these
> steps are not cleanly separated as such.  I think our update to gain
> "resumable clone" feature on the client side need to start by
> refactoring the current code, before learning "resumable clone", to
> look like the above.
>
> Once we do that, we can insert an extra step before the step that
> runs "git fetch" to optionally [*1*] grab the extra piece of
> information Kevin's "prime-clone" service produces [*2*], and store
> it in the "new repository" somewhere [*3*].
>
> And then, as you suggested, an updated "git fetch" can be taught to
> notice the priming information left by the previous step, and use it
> to attempt to download the pack until success, and to index that
> pack to learn the tips that can be used as ".have" entries in the
> request.  From the original server's point of view, this fetch
> request would "want" the same set of objects, but would appear as
> an incremental update.

Thinking about this even more, it probably makes even more sense to
move the new "learn prime info and store it in repository somewhere,
so that later re-invocation of 'git fetch' can take advantage of it"
step _into_ "git fetch".  That would allow "git fetch" in a freshly
created empty repository take advantage of this feature for free.

The step that "git clone" internally drives "git fetch" would not
actually be done by spawning a separate process with run_command()
because we would want to reuse the connection we already have with
the server when "git clone" first talked to it to learn "ls-remote"
equivalent (i.e. transport_get_remote_refs()).  I wonder if we can
do without this early "ls-remote"; that would further simplify
things by allowing us to just spawn "git fetch" internally.