On 3/19/2020 1:44 PM, Jonathan Tan wrote: > Support for partial clones with filtered trees was added in bc5975d24f > ("list-objects-filter: implement filter tree:0", 2018-10-07), but > whenever a lazy fetch of a tree is done, besides the tree itself, some > other objects that it references are also fetched. > > The "blob:none" filter was added to lazy fetches in 4c7f9567ea > ("fetch-pack: exclude blobs when lazy-fetching trees", 2018-10-04) to > restrict blobs from being fetched, but it didn't restrict trees. > ("tree:0", which would restrict all trees as well, wasn't added then > because "tree:0" was itself new and may not have been supported by Git > servers, as you can see from the dates of the commits.) > > Now that "tree:0" has been supported in Git for a while, teach lazy > fetches to use "tree:0" instead of "blob:none". > > (An alternative to doing this is to teach Git a new filter that only > returns exactly the objects requested, no more - but "tree:0" already > does that for us for now, hence this patch. If we were to support > filtering of commits in partial clones later, I think that specifying a > depth will work to restrict the commits returned, so we won't need an > additional filter anyway.) > --- > This looks like a good change to me - in particular, it makes Git align > with the (in my opinion, reasonable) mental model that when we lazily > fetch something, we only fetch that thing. Some issues that I can think > about: > > - Some hosts like GitHub support some partial clone filters, but not > "tree:0". > - I haven't figured out the performance implications yet. If we want a > tree, I think that we typically will want some of its subtrees, but > not all. > > Any thoughts? The end result of fetching missing objects one-by-one matches how the GVFS protocol has handled these tree misses in the past. While there may be a lot more round trips, it saves on excess data since a missing tree likely can reach several known trees and blobs. The real unknown here is how the "boundary" of missing trees is created. In the GVFS protocol, missing trees happen mostly when our pre-computed "prefetch pack-files" of commits and trees are behind the ref tips. The usage pattern for depth-limited or path-scoped filters is not quite as established as the blob-limited patterns (because they are similar to the behavior in VFS for Git and Scalar). The code seems to be doing what you say, but I highly recommend taking this for a spin on a real repository with a real remote, if possible. The more that we could get some numbers for which situations do better in one case or the other, the more this change can be adopted with confidence. Thanks, -Stolee