Re: how does "clone --filter=sparse:path" work?

Jeff King <peff@xxxxxxxx> · Fri, 24 May 2019 04:31:42 -0400

On Fri, May 24, 2019 at 10:05:45AM +0200, Christian Couder wrote:

> (Sorry for the late reply to this.)

No problem. I've been meaning to pick it up again, and somehow it's been
6 months. ;)

> > > But mainly I was thinking of a use case on the client of the form:
> > >
> > >     git rev-list
> > >         --objects
> > >         --filter=spec:path=.git/sparse-checkout
> 
> Do you mean "sparse:path" instead of "spec:path"?

Yes, I think so.

> > > and get a list of the blobs that you don't have and would need before
> > > you could checkout <commit> using the current sparse-checkout definition.
> > > You could then have a pre-checkout hook that would bulk
> > > fetch them before starting the actual checkout.  Since that would be
> > > more efficient than demand-loading blobs individually during the
> > > checkout.  There's more work to do in this area, but that was the idea.
> > >
> > > But back to your point, yes, I think we should restrict this over the
> > > wire.
> >
> > Thanks for your thorough response, and sorry for the slow reply. I had
> > meant to reply with a patch adding in the restriction, but I haven't
> > quite gotten to it. :)
> 
> The way I see it could be restricted is by adding a config option on
> the server, maybe called "uploadpack.sparsePathFilter", to tell which
> filenames can be accessed using "--filter=sparse:path=".
> 
> For example with uploadpack.sparsePathFilter set to
> "/home/user/git/sparse/*" and "--filter=sparse:path=foo" then
> "/home/user/git/sparse/foo" on the server would be used if it exists.
> (Of course care should be taken that things like
> "--filter=sparse:path=bar/../../foo" are rejected.)
> 
> If uploadpack.sparsePathFilter is unset or set to "false", then
> "--filter=sparse:path=<stuff>" would always error out.
> 
> Is this what you had in mind?

My plan had been to disallow it entirely, and allow some mechanism by
which the client could specify the actual set of sparse paths itself
(which it might get from a local file, or communicated in some
out-of-band way to the user cloning, etc).

If we just want a mechanism for the server to provide a pre-made sparse
list, then I think pointing people at sparse:oid=<blob> is simpler
there. I.e., your "foo" becomes "refs/sparse/foo" or even "HEAD:.sparse"
or similar, and the server admin just sticks the content into the repo
instead of dealing with exposing filesystem paths to the client.

-Peff