Josh Steadmon <steadmon@xxxxxxxxxx> writes: > When cloning a repo with a --filter and with --recurse-submodules > enabled, the partial clone filter only applies to > the top-level repo. This can lead to unexpected bandwidth and disk > usage for projects which include large submodules. For example, a user > might wish to make a partial clone of Gerrit and would run: > `git clone --recurse-submodules --filter=blob:5k > https://gerrit.googlesource.com/gerrit`. However, only the superproject > would be a partial clone; all the submodules would have all blobs > downloaded regardless of their size. With this change, the same filter > applies to submodules, meaning the expected bandwidth and disk savings > apply consistently. > > Plumb the --filter argument from git-clone through git-submodule and > git-submodule--helper, such that submodule clones also have the filter > applied. > > This applies the same filter to the superproject and all submodules. > Users who prefer the current behavior (i.e., a filter only on the > superproject) would need to clone with `--no-recurse-submodules` and > then manually initialize each submodule. Two concerns (I do not say "issues", because I honestly do not know how much this will hurt in the future). - Obviously, this changes the end user experience. To users in the scenario that motivated this change (described above), obviously it is a change in a good way, and but I wonder if there are workflows that are hurt and actually have to resort to the workaround to preserve the current behaviour. - Passing the filter down to submodules means that the filter settings are universal across projects. The current set of filters, I do not think such an assumption is too bad. If 5k blob is too large for the top-level superproject, it is OK for the superproject to dictate that 5k blob is too large for any of the submodules the superproject uses. But can we forever limit the filter vocabulary to the ones that can sensibly be applied recursively? If we had a filter that goes with pathnames (e.g. "I only want src/ and test/ directories and nothing else initially"), such a set of pathnames appropriate in the context of the superproject is unlikely to apply to its submodules. Even the existing "depth" filter is iffy, if a toplevel superproject is fairly flat and one of the submodules has a directory hierarchy that is ultra deep. Will queue and wait for others to comment. Thanks.