Derrick Stolee <derrickstolee@xxxxxxxxxx> 于2022年9月2日周五 03:24写道: > > On 9/1/2022 5:41 AM, ZheNing Hu via GitGitGadget wrote: > > This patch let partial clone have the similar capabilities of the shallow > > clone git clone --depth=<depth>. > ... > > Now we can use git clone --filter="depth=<depth>" to omit all commits whose > > depth is >= <depth>. By this way, we can have the advantages of both shallow > > clone and partial clone: Limiting the depth of commits, get other objects on > > demand. > > I have several concerns about this proposal. > > The first is that "depth=X" doesn't mean anything after the first > clone. What will happen when we fetch the remaining objects? > According to the current results, yes, it still downloads a large number of commits. Do a litte test again: $ git clone --filter=depth:2 git.git git Cloning into 'git'... remote: Enumerating objects: 4311, done. remote: Counting objects: 100% (4311/4311), done. remote: Compressing objects: 100% (3788/3788), done. Just see how many objects... $ git cat-file --batch-check --batch-all-objects | grep blob | wc -l warning: This repository uses promisor remotes. Some objects may not be loaded. 4098 $ git cat-file --batch-check --batch-all-objects | grep tree | wc -l warning: This repository uses promisor remotes. Some objects may not be loaded. 211 $ git cat-file --batch-check --batch-all-objects | grep commit | wc -l warning: This repository uses promisor remotes. Some objects may not be loaded. 2 $ git checkout HEAD~ Fetch nothing...because depth=2. $ git checkout HEAD~ remote: Enumerating objects: 198514, done. remote: Counting objects: 100% (198514/198514), done. remote: Compressing objects: 100% (68511/68511), done. remote: Total 198514 (delta 128408), reused 198509 (delta 128406), pack-reused 0 Receiving objects: 100% (198514/198514), 77.07 MiB | 9.58 MiB/s, done. Resolving deltas: 100% (128408/128408), done. remote: Enumerating objects: 1, done. remote: Counting objects: 100% (1/1), done. remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (1/1), 14.35 KiB | 14.35 MiB/s, done. remote: Enumerating objects: 198014, done. remote: Counting objects: 100% (198014/198014), done. remote: Compressing objects: 100% (68362/68362), done. remote: Total 198014 (delta 128056), reused 198012 (delta 128055), pack-reused 0 Receiving objects: 100% (198014/198014), 76.55 MiB | 14.00 MiB/s, done. Resolving deltas: 100% (128056/128056), done. Previous HEAD position was 624a936234 Merge branch 'en/merge-multi-strategies' HEAD is now at 014a9ea207 Merge branch 'en/t4301-more-merge-tree-tests' Fetch a lot of objects... (three times!) $ git cat-file --batch-check --batch-all-objects | grep blob | wc -l warning: This repository uses promisor remotes. Some objects may not be loaded. 4099 $ git cat-file --batch-check --batch-all-objects | grep tree | wc -l warning: This repository uses promisor remotes. Some objects may not be loaded. 130712 $ git cat-file --batch-check --batch-all-objects | grep commit | wc -l warning: This repository uses promisor remotes. Some objects may not be loaded. 67815 It fetched too many Commits and Trees... But Surprisingly, only one more blob was downloaded. I admit that this is a very bad action, That's because we have no commits locally... Maybe one solution: we can also provide a commit-id parameter inside the depth filter, like --filter="commit:014a9ea207, depth:1"... we can clone with blob:none filter to download all trees/commits, then fetch blobs with this "commit-depth" filter.... even we can provide a more complex filter: --filter="commit:014a9ea207, depth:1, type=blob" This may avoid downloading too many unneeded commits and trees... git fetch --filter="commit:014a9ea207, depth:1, type=blob" If git fetch have learned this filter, then git checkout or other commands can use this filter internally heuristically: e.g. git checkout HEAD~ if HEAD~ missing | 75% blobs/trees in HEAD~ missing -> use "commit-depth" filter else -> use blob:none filter We can even make this commit-depth filter support multiple commits later. > Partial clone is designed to download a subset of objects, but make > the remaining reachable objects downloadable on demand. By dropping > reachable commits, the normal partial clone mechanism would result > in a 'git rev-list' call asking for a missing commit. Would this > inherit the "depth=X" but result in a huge amount of over-downloading > the trees and blobs in that commit range? Would it result in downloading > commits one-by-one, and then their root trees (and all reachable objects > from those root trees)? > I don't know if it's possible let git rev-list know that commits is missing, and stop download them. (just like git cat-file --batch --batch-all-objects does) Similarly, you can let git log or other commands to understand this... Probably a config var: fetch.skipmissingcommits... > Finally, computing the set of objects to send is just as expensive as > if we had a shallow clone (we can't use bitmaps). However, we get the > additional problem where fetches do not have a shallow boundary, so > the server will send deltas based on objects that are not necessarily > present locally, triggering extra requests to resolve those deltas. > Agree, I think this maybe a problem, but there is no good solution for it. > This fallout remains undocumented and unexplored in this series, but I > doubt the investigation would result in positive outcomes. > > > Disadvantages of git clone --depth=<depth> --filter=blob:none: we must call > > git fetch --unshallow to lift the shallow clone restriction, it will > > download all history of current commit. > > How does your proposal fix this? Instead of unshallowing, users will > stumble across these objects and trigger huge downloads by accident. > As mentioned above, I would expect a commit-depth filter to fix this. > > Disadvantages of git clone --filter=blob:none with git sparse-checkout: The > > git client needs to send a lot of missing objects' id to the server, this > > can be very wasteful of network traffic. > > Asking for a list of blobs (especially limited to a sparse-checkout) is > much more efficient than what will happen when a user tries to do almost > anything in a repository formed the way you did here. > Yes. also as mentioned above, enabling this filter in some specific cases: e.g. we have the commit but not all trees/blobs in it. > Thinking about this idea, I don't think it is viable. I would need to > see a lot of work done to test these scenarios closely to believe that > this type of partial clone is a desirable working state. > Agree. > Thanks, > -Stolee Thanks to these reviews and criticisms, it makes me think more :) ZheNing Hu