On 2/20/2020 3:28 PM, Jakub Narebski wrote: > "Garima Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: Garima Singh <garima.singh@xxxxxxxxxxxxx> >> >> Add --changed-paths option to git commit-graph write. This option will >> allow users to compute information about the paths that have changed >> between a commit and its first parent, and write it into the commit graph >> file. If the option is passed to the write subcommand we set the >> COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the >> commit-graph logic. > > In the manpage you write that this operation (computing Bloom filters) > can take a while on large repositories. Could you perhaps provide some > numbers: how much longer does it take to write commit-graph file with > and without '--changed-paths' for example for Linux kernel, or some > other large repository? Thanks in advance. > Yes. Will include numbers as appropriate in v3. >> >> Helped-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> Signed-off-by: Garima Singh <garima.singh@xxxxxxxxxxxxx> >> --- >> Documentation/git-commit-graph.txt | 5 +++++ >> builtin/commit-graph.c | 9 +++++++-- >> 2 files changed, 12 insertions(+), 2 deletions(-) > > What is missing is some sanity tests: that bloom index and bloom data > chunks are not present without '--changed-paths', and that they are > added with '--changed-paths'. > > If possible, maybe also check in a separate test that the size of > bloom_index chunk agrees with the number of commits in the commit graph. > > > Also, we can now add those tests I have wrote about in my review of > previous patch, that is: > > 1. If you write commit-graph with --changed-paths, and either add some > commits later or exclude some commits from the commit graph, then: > > a.) commit(s) in commit-graph have Bloom filter > b.) commit(s) not in commit-graph do not have Bloom filter > > 2. If you write commit-graph without --changed-paths as base layer, > and then write next layer with --changed-paths and --split, then: > > a.) commit(s) in top layer have Bloom filter(s) > b.) commit(s) in bottom layer don't have Bloom filter(s) > I will see what more can be done here. >> >> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt >> index bcd85c1976..907d703b30 100644 >> --- a/Documentation/git-commit-graph.txt >> +++ b/Documentation/git-commit-graph.txt >> @@ -54,6 +54,11 @@ or `--stdin-packs`.) >> With the `--append` option, include all commits that are present in the >> existing commit-graph file. >> + >> +With the `--changed-paths` option, compute and write information about the >> +paths changed between a commit and it's first parent. This operation can >> +take a while on large repositories. It provides significant performance gains >> +for getting history of a directory or a file with `git log -- <path>`. >> ++ > > Should we write about limitation that the topmost layer in the split > commit graph needs to be written with '--changed-paths' for Git to use > this information? Or perhaps we should try (in the future) to remove > this limitation?? > Given that this information is going to be used best effort, it would be superfluous to describe every case and conditional that decides whether this information is being used. >> @@ -143,6 +146,8 @@ static int graph_write(int argc, const char **argv) >> flags |= COMMIT_GRAPH_WRITE_SPLIT; >> if (opts.progress) >> flags |= COMMIT_GRAPH_WRITE_PROGRESS; >> + if (opts.enable_changed_paths) >> + flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS; >> >> read_replace_refs = 0; > > All right. This actually turns on calculation Bloom filters for changed > paths, thanks to > > ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; > > that was added by the "[PATCH v2 04/11] commit-graph: compute Bloom > filters for changed paths" patch. > > Though... should this enabling be split into two separate patches like > this? > The idea is that in 4/11 We compute only if the flag is set. And between that patch and this one: we prepare the foundational code that is now ready for that flag to be set via an opt-in by the user. > > Best, >