Hi Stolee, On 20-11-20 15 h 36, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > The partial clone feature has several modes, but only a few are quick > for a server to process using reachability bitmaps: > > * Blobless: --filter=blob:none downloads all commits and trees and > fetches necessary blobs on-demand. > > * Treeless: --filter=tree:0 downloads all commits and fetches necessary > trees and blobs on demand. > > This treeles mode is most similar to a shallow clone in the total size > (it only adds the commit objects for the full history). This makes > treeless clones an interesting replacement for shallow clones. A user > can run more commands in a treeless clone than in a shallow clone, > especially 'git log' (no pathspec). > > In particular, servers can still serve 'git fetch' requests quickly by > calculating the difference between commit wants and haves using bitmaps. > > I was testing this feature with this in mind, and I knew that some trees > would be downloaded multiple times when checking out a new branch, but I > did not expect to discover a significant issue with 'git fetch', at > least in repostiories with submodules. > > I was testing these commands: > > $ git clone --filter=tree:0 --single-branch --branch=master \ > https://github.com/git/git > $ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*" > > This fetch command started downloading several pack-files of trees > before completing the command. I never let it finish since I got so > impatient with the repeated downloads. During debugging, I found that > the stack triggering promisor_remote_get_direct() was going through > fetch_populated_submodules(). Notice that I did not recurse my > submodules in the original clone, so the sha1collisiondetection > submodule is not initialized. Even so, my 'git fetch' was scanning > commits for updates to submodules. I'm not super familiar with the inner workings offetch_populated_submodules(), but is seems weird that this function does something in that case. It should do nothing, as the submodule is not populated. Maybe it would be worth it to investigate what exactly is happening? > I decided that even if I did populate the submodules, the nature of > treeless clones makes me not want to care about the contents of commits > other than those that I am explicitly navigating to. > > This loop of tree fetches can be avoided by adding > --no-recurse-submodules to the 'git fetch' command or setting > fetch.recurseSubmodules=no. > > To make this as painless as possible for future users of treeless > clones, automatically set fetch.recurseSubmodules=no at clone time. > > Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > --- > clone: --filter=tree:0 implies fetch.recurseSubmodules=no > > While testing different partial clone options, I stumbled across this > one. My initial thought was that we were parsing commits and loading > their root trees unnecessarily, but I see that doesn't happen after this > change. > > Here are some recent discussions about using --filter=tree:0: > > [1] > https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@xxxxxxxxxxx/ > [2] https://lore.kernel.org/git/cover.1588633810.git.me@xxxxxxxxxxxx/[3] > https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@xxxxxxxxxxx/ > > Thanks, -Stolee > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/797 > > list-objects-filter-options.c | 4 ++++ > t/t5616-partial-clone.sh | 6 ++++++ > 2 files changed, 10 insertions(+) In any case I think such a change would also need a doc update, probably in Documentation/fetch-options.txt and Documentation/config/fetch.txt. Cheers, Philippe.