On 4/8/2020 6:31 PM, Jeff King wrote: > On Wed, Apr 08, 2020 at 04:38:27PM +0000, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> The changed-path Bloom filters record an entry in the filter for >> every path that was changed. This includes every add and delete, >> regardless of whther a rename was detected. Detecting renames >> causes significant performance issues, but also will trigger >> downloading missing blobs in partial clone. >> >> The simple fix is to disable rename detection when computing a >> changed-path Bloom filter. > > Yes, we should be doing as simple a tree-diff as possible. > > I wonder if this might actually be fixing a potential bug, too. The loop > below this code only looks at the "two" half of each queued diff pair. > With renames enabled, we might see the source path only in the "one" > half of the rename pair. > > However, this seems to work already. If I do: > > echo content >file > git add file > git commit -m added > echo change >file > git commit -am changed > git mv file other > git commit -am 'rename away' > git mv other file > git commit -am 'rename back' > git rm file > git commit -am removed > > git commit-graph write --reachable --changed-paths > git log --oneline -- file | cat > > then I see all of the commits. Instrumenting Git like: > > diff --git a/bloom.c b/bloom.c > index c5b461d1cf..fb2a758e1d 100644 > --- a/bloom.c > +++ b/bloom.c > @@ -207,6 +207,10 @@ struct bloom_filter *get_bloom_filter(struct repository *r, > for (i = 0; i < diff_queued_diff.nr; i++) { > const char *path = diff_queued_diff.queue[i]->two->path; > > + warning("queuing touched %c path %s for %s", > + diff_queued_diff.queue[i]->status, > + path, oid_to_hex(&c->object.oid)); > + > /* > * Add each leading directory of the changed file, i.e. for > * 'dir/subdir/file' add 'dir' and 'dir/subdir' as well, so > > results in: > > warning: queuing touched A path file for 2346d88b0cb4bca11c38ee545d007a7a14ca472a > warning: queuing touched M path file for 991cd7f0696ae29fea738ca1b8340c90dae4b201 > warning: queuing touched D path file for d3642c9fb27459ea09f6c967a1e6ad119e265d6f > warning: queuing touched A path other for d3642c9fb27459ea09f6c967a1e6ad119e265d6f > warning: queuing touched A path file for bc908eb29e562d97ebb8cf718e41b69d3aa1d834 > warning: queuing touched D path other for bc908eb29e562d97ebb8cf718e41b69d3aa1d834 > warning: queuing touched D path file for 7433b46bd6aa170ab17a651c10658a5b0c10ba4f > > So we really aren't detecting renames in the first place! And indeed, > checking diffopt.detect_rename shows that it is unset. So I'm curious if > there is a case where that would not be true. I _think_ it would only be > true in a program which ran init_diff_ui_defaults(), but never in > git-commit-graph. So our issue was really that the partial clone prefetch logic was just overly aggressive. > Even if it does nothing in practice, I'm not at all opposed to having it > in there as an explicit documentation of our expectations/requirements > for the loop below. But it's probably worth saying so in the commit > message. I will update the message and send a v2. I'll fix the typo that Philip pointed out, too. Thanks, -Stolee