Re: Inefficiency of partial shallow clone vs shallow clone + "old-style" sparse checkout

Jeff King <peff@xxxxxxxx> · Wed, 1 Apr 2020 07:44:46 -0400

On Wed, Apr 01, 2020 at 04:49:20AM +0300, Konstantin Tokarev wrote:

> > Less efficient use of network bandwidth is one thing, but shallow clones are
> > also more CPU-intensive with the "counting objects" phase on the server. Your
> > link shares the following end-to-end timings:
> >
> > * Shallow-clone: 234s
> > * Partial clone: 286s
> > * Both(???): 1023s
> >
> > The data implies that by asking for both you actually got a full clone (4.1 GB).
> 
> No, this is still a partial clone, full clone takes more than 6 GB

I think that 4GB number is just because of the bug, though. With the fix
I showed earlier, doing clones of linux.git from a local repo yields:

  type       objects (in passes)      bytes  time
  ----       -----------------------  -----  ----
  shallow      71447 (  71447+  n/a)  188MB   23s
  blob:none  5260567 (5193557+67010)  870MB   99s
  both         71447 (   4437+67010)  188MB   37s

The object counts and sizes make sense. blob:none is still going to get
the whole history of commits and trees, which are substantial. The sizes
for "shallow" and "both" are the same, because the checkout is going to
grab all of the blobs from the tip commit, which were included in the
original "shallow" anyway. It does take longer, because they come in a
second followup fetch (though I'm surprised it's so _much_ slower).

So to me that implies that shallow is strictly better than partial if
you're just going to check out the full tip commit. But doing both
together opens up the possibility of narrowing the sparse checkout.
Doing:

  $ git clone --no-local --no-checkout --filter=blob:none --depth=1 \
      /path/to/linux sparse
  $ cd sparse
  $ git sparse-checkout set arch

fetches 20795 objects (4437+16357+1), consuming only 27MB.

-Peff