Hi Robert, Documentation for the config setting is an acceptable solution! Apologies for the late response -- wanted to wait and see if anyone else on the list had any last thoughts. Also I noticed you were hoping that Jonathan Tan could take a look at your patch on the What's Cooking thread. Before I sent my first review out, I discussed your patch with him so he's been briefed. Reviewed-by: Calvin Wan <calvinwan@xxxxxxxxxx> On Thu, Mar 10, 2022 at 6:29 AM Robert Coup <robert@xxxxxxxxxxx> wrote: > > Hi, > > On Wed, 9 Mar 2022 at 21:32, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > The way I read Calvin's suggestion was that you won't allow such a > > random series of "git fetch"es without updating the "this is the > > filter that is consistent with the contents of this repository" > > record, which will lead to inconsistencies. I.e. > > > > - we must maintain the "filter that is consistent with the contents > > of this repository", which this series does not do, but we should. > > I don't think we should strive to keep this "consistency" — > > > - the "--refetch" is unnecessary and redundant, as long as such a > > record is maintained; when a filter settings changes, we should > > do the equivalent of "--refetch" automatically. > > — we don't know how much data has been pulled in by fetches from > different promisor and non-promisor remotes (past & present); or > dynamically faulted in through branch switching or history > exploration. And I can't see any particular benefit in attempting to > keep track of that? > > Ævar suggested in future maybe we could figure out which commits a > user definitively has all the blobs & trees for and refetch could > negotiate from that position to improve efficiency: nothing in this > series precludes such an enhancement. > > > ... isn't "git fetch --fitler" that does not update the configured > > filter (and does not do a refetch automatically) a bug that made the > > "refetch" necessary in the first place? > > I don't believe it's a bug. A fairly obvious partial clone example > I've used before on repos where I want the commit history but not all > the associated data (especially when the history is littered with > giant blobs I don't care about): > > git clone example.com/myrepo --filter=blob:none > # does a partial clone with no blobs > # checkout faults in the blobs present at HEAD in bulk to populate > the working tree > git config --unset remote.origin.partialclonefilter > # going forward, future fetches include all associated blobs for new commits > > Getting all the blobs for all history is something I'm explicitly > trying not to do in this example, but if the next fetch from origin > automatically did a "refetch" after I removed the filter that's > exactly what would happen. > > We don't expect users to update `diff.algorithm` in config to run a > minimal diff: using the `--diff-algorithm=` option on the command line > overrides the config. And the same philosophy applies with fetch: > `remote.<name>.partialclonefilter` provides the default filter for > fetches, and a user can override it via `git fetch --filter=`. To me > this is how Git commands are expected to work. > > Partial clones are still relatively new and advanced, and I don't > believe we should try and over-predict too much what the correct > behaviour is for a user. > > I'd be happy adding something to the documentation for the > `remote.<name>.partialclonefilter` config setting to explain that > changing or removing the filter won't backfill the local object DB and > the user would need `fetch --refetch` for that. > > Thanks, > Rob :)