Re: [PATCH v4 9/9] Documentation/config: add odb.<name>.promisorRemote

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 26 Sep 2018 06:44:24 -0700

On Wed, Sep 26, 2018 at 12:12:22AM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 03:31:36PM -0700, Junio C Hamano wrote:
>
> > Christian Couder <christian.couder@xxxxxxxxx> writes:
> >
> > > The main issue that this patch series tries to solve is that
> > > extensions.partialclone config option limits the partial clone and
> > > promisor features to only one remote. One related issue is that it
> > > also prevents to have other kind of promisor/partial clone/odb
> > > remotes. By other kind I mean remotes that would not necessarily be
> > > git repos, but that could store objects (that's where ODB, for Object
> > > DataBase, comes from) and could provide those objects to Git through a
> > > helper (or driver) script or program.
> >
> > I do not think "sources that are not git repositories" is all that
> > interesting, unless they can also serve as the source for ext::
> > remote helper.  And if they can serve "git fetch ext::...", I think
> > they can be treated just like a normal Git repository by the
> > backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> [ ... ]
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.
>
> However, that may be an overly constricting interface for the helper.
> E.g., we might want to be able to issue several requests and have them
> transfer in parallel. But I suppose we could teach that trick to
> upload-pack in the long run, as it may be applicable even to fetching
> from "real" git repos.
>
> Hmm. Actually, I kind of like that direction the more I think about it.

Yes, this is an important design decision for Git LFS, which I believe
is important to this series. Git LFS allows the caller to issue `n`
parallel object transfers (uploads or downloads) at a time, which is
useful when, say, checking out a repository that has many large objects.

We do this trick with 'filter.lfs.process', where we accumulate many Git
LFS objects that we wish to tell Git about so that it can check them out
into the working copy, and then promise that we will provide the
contents later (e.g., by sending status=delayed).

We then "batch" up all of those requests, issue them all at once (after
which the LFS API will tell us the URLs of where to upload/download each
item), and then we open "N" threads to do that work.

After all of that, we respond back with all of the objects that we had
to download, and close the process filter.

Thanks,
Taylor