Re: [PATCH 00/10] RFC Partial Clone and Fetch

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Wed, 22 Mar 2017 13:51:16 -0400

On 3/22/2017 12:21 PM, Johannes Schindelin wrote:
Hi Kostis,

On Wed, 22 Mar 2017, ankostis wrote:

On 8 March 2017 at 19:50,  <git@xxxxxxxxxxxxxxxxx> wrote:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

[RFC] Partial Clone and Fetch
=============================

This is a WIP RFC for a partial clone and fetch feature wherein the
client can request that the server omit various blobs from the
packfile during clone and fetch.  Clients can later request omitted
blobs (either from a modified upload-pack-like request to the server
or via a completely independent mechanism).

Is it foreseen the server to *decide* with partial objects to serve
And the cloning-client still to work ok?

The foreseeable use case will be to speed up clones of insanely large
repositories by omitting blobs that are not immediately required, and let
the client fetch them later on demand.

That is all, no additional permission model or anything. In fact, we do
not even need to ensure that blobs are reachable in our use case, as only
trusted parties are allowed to access the server to begin with.

That does not mean, of course, that there should not be an option to limit
access to objects that are reachable.

My case in mind is storing confidential files in Git (server)
that I want to publicize them to partial-cloning clients,
for non-repudiation, by sending out trees and commits alone
(or any non-sensitive blobs).

A possible UI would be to rely on a `.gitattributes` to specify
which objects are to be upheld.

Apologies if I'm intruding with an unrelated feature requests.

I think this is a valid use case, and Jeff's design certainly does not
prevent future patches to that end.

However, given that Jeff's use case does not require any such feature, I
would expect the people who want those features to do the heavy lifting on
top of his work. It is too different from the intended use case to
reasonably ask of Jeff.

As Johannes said, all I'm proposing is a way to limit the amount of
data the client receives to help git scale to extremely large
repositories.  For example, I probably don't need 20 years of history
or the entire source tree if I'm only working in a narrow subset of
the tree.

I'm not sure how you would achieve the confidential file scenario
that you describe, but you might try to build on it and see if you
can make it work.

Jeff