Re: [PATCH 0/3] list-object-filter: introduce depth filter

Derrick Stolee <derrickstolee@xxxxxxxxxx> · Thu, 1 Sep 2022 15:24:18 -0400

On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote:
> This patch let partial clone have the similar capabilities of the shallow
> clone git clone --depth=<depth>.
...
> Now we can use git clone --filter="depth=<depth>" to omit all commits whose
> depth is >= <depth>. By this way, we can have the advantages of both shallow
> clone and partial clone: Limiting the depth of commits, get other objects on
> demand.

I have several concerns about this proposal.

The first is that "depth=X" doesn't mean anything after the first
clone. What will happen when we fetch the remaining objects?

Partial clone is designed to download a subset of objects, but make
the remaining reachable objects downloadable on demand. By dropping
reachable commits, the normal partial clone mechanism would result
in a 'git rev-list' call asking for a missing commit. Would this
inherit the "depth=X" but result in a huge amount of over-downloading
the trees and blobs in that commit range? Would it result in downloading
commits one-by-one, and then their root trees (and all reachable objects
from those root trees)?

Finally, computing the set of objects to send is just as expensive as
if we had a shallow clone (we can't use bitmaps). However, we get the
additional problem where fetches do not have a shallow boundary, so
the server will send deltas based on objects that are not necessarily
present locally, triggering extra requests to resolve those deltas.

This fallout remains undocumented and unexplored in this series, but I
doubt the investigation would result in positive outcomes.

> Disadvantages of git clone --depth=<depth> --filter=blob:none: we must call
> git fetch --unshallow to lift the shallow clone restriction, it will
> download all history of current commit.

How does your proposal fix this? Instead of unshallowing, users will
stumble across these objects and trigger huge downloads by accident.

> Disadvantages of git clone --filter=blob:none with git sparse-checkout: The
> git client needs to send a lot of missing objects' id to the server, this
> can be very wasteful of network traffic.

Asking for a list of blobs (especially limited to a sparse-checkout) is
much more efficient than what will happen when a user tries to do almost
anything in a repository formed the way you did here.

Thinking about this idea, I don't think it is viable. I would need to
see a lot of work done to test these scenarios closely to believe that
this type of partial clone is a desirable working state.

Thanks,
-Stolee