"Garima Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Garima Singh <garima.singh@xxxxxxxxxxxxx> > > Add --changed-paths option to git commit-graph write. This option will > allow users to compute information about the paths that have changed > between a commit and its first parent, and write it into the commit graph > file. If the option is passed to the write subcommand we set the > COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the > commit-graph logic. In the manpage you write that this operation (computing Bloom filters) can take a while on large repositories. Could you perhaps provide some numbers: how much longer does it take to write commit-graph file with and without '--changed-paths' for example for Linux kernel, or some other large repository? Thanks in advance. > > Helped-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > Signed-off-by: Garima Singh <garima.singh@xxxxxxxxxxxxx> > --- > Documentation/git-commit-graph.txt | 5 +++++ > builtin/commit-graph.c | 9 +++++++-- > 2 files changed, 12 insertions(+), 2 deletions(-) What is missing is some sanity tests: that bloom index and bloom data chunks are not present without '--changed-paths', and that they are added with '--changed-paths'. If possible, maybe also check in a separate test that the size of bloom_index chunk agrees with the number of commits in the commit graph. Also, we can now add those tests I have wrote about in my review of previous patch, that is: 1. If you write commit-graph with --changed-paths, and either add some commits later or exclude some commits from the commit graph, then: a.) commit(s) in commit-graph have Bloom filter b.) commit(s) not in commit-graph do not have Bloom filter 2. If you write commit-graph without --changed-paths as base layer, and then write next layer with --changed-paths and --split, then: a.) commit(s) in top layer have Bloom filter(s) b.) commit(s) in bottom layer don't have Bloom filter(s) > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt > index bcd85c1976..907d703b30 100644 > --- a/Documentation/git-commit-graph.txt > +++ b/Documentation/git-commit-graph.txt > @@ -54,6 +54,11 @@ or `--stdin-packs`.) > With the `--append` option, include all commits that are present in the > existing commit-graph file. > + > +With the `--changed-paths` option, compute and write information about the > +paths changed between a commit and it's first parent. This operation can > +take a while on large repositories. It provides significant performance gains > +for getting history of a directory or a file with `git log -- <path>`. > ++ Should we write about limitation that the topmost layer in the split commit graph needs to be written with '--changed-paths' for Git to use this information? Or perhaps we should try (in the future) to remove this limitation?? > With the `--split` option, write the commit-graph as a chain of multiple > commit-graph files stored in `<dir>/info/commit-graphs`. The new commits > not already in the commit-graph are added in a new "tip" file. This file > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index e0c6fc4bbf..261dcce091 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -9,7 +9,7 @@ > > static char const * const builtin_commit_graph_usage[] = { > N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > + N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] <split options>"), > NULL > }; > > @@ -19,7 +19,7 @@ static const char * const builtin_commit_graph_verify_usage[] = { > }; > > static const char * const builtin_commit_graph_write_usage[] = { > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > + N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] <split options>"), > NULL > }; > All right. > @@ -32,6 +32,7 @@ static struct opts_commit_graph { > int split; > int shallow; > int progress; > + int enable_changed_paths; Bikeshed painting: should this field be called enable_changed_paths or simply changed_paths? > } opts; > > static int graph_verify(int argc, const char **argv) > @@ -110,6 +111,8 @@ static int graph_write(int argc, const char **argv) > N_("start walk at commits listed by stdin")), > OPT_BOOL(0, "append", &opts.append, > N_("include all commits already in the commit-graph file")), > + OPT_BOOL(0, "changed-paths", &opts.enable_changed_paths, > + N_("enable computation for changed paths")), > OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), > OPT_BOOL(0, "split", &opts.split, > N_("allow writing an incremental commit-graph file")), All right. > @@ -143,6 +146,8 @@ static int graph_write(int argc, const char **argv) > flags |= COMMIT_GRAPH_WRITE_SPLIT; > if (opts.progress) > flags |= COMMIT_GRAPH_WRITE_PROGRESS; > + if (opts.enable_changed_paths) > + flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS; > > read_replace_refs = 0; All right. This actually turns on calculation Bloom filters for changed paths, thanks to ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0; that was added by the "[PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths" patch. Though... should this enabling be split into two separate patches like this? Best, -- Jakub Narębski