Re: How hard would it be to implement sparse fetching/pulling?

Vitaly Arbuzov <vit@xxxxxxxx> · Thu, 30 Nov 2017 19:37:24 -0800

Makes sense, I think this perfectly aligns with our needs too.
Let me dive deeper into those patches and previous discussions, that
you've kindly shared above, so I better understand details.

I'm very excited about what you guys already did, it's a big deal for
the community!

On Thu, Nov 30, 2017 at 6:51 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Hi Vitaly,
>
> Vitaly Arbuzov wrote:
>
>> I think it would be great if we high level agree on desired user
>> experience, so let me put a few possible use cases here.
>
> I think one thing this thread is pointing to is a lack of overview
> documentation about how the 'partial clone' series currently works.
> The basic components are:
>
>  1. extending git protocol to (1) allow fetching only a subset of the
>     objects reachable from the commits being fetched and (2) later,
>     going back and fetching the objects that were left out.
>
>     We've also discussed some other protocol changes, e.g. to allow
>     obtaining the sizes of un-fetched objects without fetching the
>     objects themselves
>
>  2. extending git's on-disk format to allow having some objects not be
>     present but only be "promised" to be obtainable from a remote
>     repository.  When running a command that requires those objects,
>     the user can choose to have it either (a) error out ("airplane
>     mode") or (b) fetch the required objects.
>
>     It is still possible to work fully locally in such a repo, make
>     changes, get useful results out of "git fsck", etc.  It is kind of
>     similar to the existing "shallow clone" feature, except that there
>     is a more straightforward way to obtain objects that are outside
>     the "shallow" clone when needed on demand.
>
>  3. improving everyday commands to require fewer objects.  For
>     example, if I run "git log -p", then I way to see the history of
>     most files but I don't necessarily want to download large binary
>     files just to print 'Binary files differ' for them.
>
>     And by the same token, we might want to have a mode for commands
>     like "git log -p" to default to restricting to a particular
>     directory, instead of downloading files outside that directory.
>
>     There are some fundamental changes to make in this category ---
>     e.g. modifying the index format to not require entries for files
>     outside the sparse checkout, to avoid having to download the
>     trees for them.
>
> The overall goal is to make git scale better.
>
> The existing patches do (1) and (2), though it is possible to do more
> in those categories. :)  We have plans to work on (3) as well.
>
> These are overall changes that happen at a fairly low level in git.
> They mostly don't require changes command-by-command.
>
> Thanks,
> Jonathan