Re: [PATCH] clone, submodule: pass partial clone filters to submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Josh Steadmon <steadmon@xxxxxxxxxx> writes:

> When cloning a repo with a --filter and with --recurse-submodules
> enabled, the partial clone filter only applies to
> the top-level repo. This can lead to unexpected bandwidth and disk
> usage for projects which include large submodules. For example, a user
> might wish to make a partial clone of Gerrit and would run:
> `git clone --recurse-submodules --filter=blob:5k
> https://gerrit.googlesource.com/gerrit`. However, only the superproject
> would be a partial clone; all the submodules would have all blobs
> downloaded regardless of their size. With this change, the same filter
> applies to submodules, meaning the expected bandwidth and disk savings
> apply consistently.
>
> Plumb the --filter argument from git-clone through git-submodule and
> git-submodule--helper, such that submodule clones also have the filter
> applied.
>
> This applies the same filter to the superproject and all submodules.
> Users who prefer the current behavior (i.e., a filter only on the
> superproject) would need to clone with `--no-recurse-submodules` and
> then manually initialize each submodule.

Two concerns (I do not say "issues", because I honestly do not know
how much this will hurt in the future).

 - Obviously, this changes the end user experience.  To users in the
   scenario that motivated this change (described above), obviously
   it is a change in a good way, and but I wonder if there are
   workflows that are hurt and actually have to resort to the
   workaround to preserve the current behaviour.

 - Passing the filter down to submodules means that the filter
   settings are universal across projects.  The current set of
   filters, I do not think such an assumption is too bad.  If 5k
   blob is too large for the top-level superproject, it is OK for
   the superproject to dictate that 5k blob is too large for any of
   the submodules the superproject uses.  But can we forever limit
   the filter vocabulary to the ones that can sensibly be applied
   recursively?  If we had a filter that goes with pathnames
   (e.g. "I only want src/ and test/ directories and nothing else
   initially"), such a set of pathnames appropriate in the context
   of the superproject is unlikely to apply to its submodules.  Even
   the existing "depth" filter is iffy, if a toplevel superproject
   is fairly flat and one of the submodules has a directory
   hierarchy that is ultra deep.

Will queue and wait for others to comment.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux