"Garima Singh via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Garima Singh <garima.singh@xxxxxxxxxxxxx> > > Add --changed-paths option to git commit-graph write. This option will > soon allow users to compute bloom filters for the paths changed between > a commit and its first significant parent, and write this information > into the commit-graph file. A slightly nitpicky comment. First, I think it is "Bloom filter", not "bloom filter" (from the name of the person that discovered them, Burton Howard Bloom). Second, I would rather that the commit message started with at least one sentence of describing purpose of this new option, not going straight to the technical details (i.e. using Bloom filters). Or in any other way describe that this option would make Git store some helper data that would help find out faster if a given path was changed in given commit. > Note: This commit does not change any behavior. It only introduces > the option and passes down the appropriate flag to the commit-graph. All right. Personally, I don't have strong opinion for or against separating this change into its own patch. > RFC Notes: > 1. We named the option --changed-paths to capture what the option does, > instead of how it does it. The current implementation does this > using bloom filters. We believe using --changed-paths however keeps > the implementation open to other data structures. > All thoughts and suggestions for the name and this approach are > welcome It is all right name. Another option could be for example `git commit-graph write --changeset-info`, or something like that. > > 2. Currently, a subsequent commit in this series will add tests that > exercise this option. I plan to split that test commit across the > series as appropriate. There is another thing, but one that could be left for the followup series, namely the configuration variables for this behavior. In the future it should be possible to switch some configuration variable to have this feature on by default when manually or automatically running `git commit-graph write`. > > Helped-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > Signed-off-by: Garima Singh <garima.singh@xxxxxxxxxxxxx> > --- > Documentation/git-commit-graph.txt | 5 +++++ > builtin/commit-graph.c | 9 +++++++-- > commit-graph.h | 3 ++- > 3 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt > index bcd85c1976..1efe6e5c5a 100644 > --- a/Documentation/git-commit-graph.txt > +++ b/Documentation/git-commit-graph.txt It is nice to have option documented. All right, the 'write' subcommand has the following synopsis: 'git commit-graph write' <options> [--object-dir <dir>] [--[no-]progress] so the is no need to adjust it when adding a new option. > @@ -54,6 +54,11 @@ or `--stdin-packs`.) > With the `--append` option, include all commits that are present in the > existing commit-graph file. > + > +With the `--changed-paths` option, compute and write information about the > +paths changed between a commit and it's first parent. This operation can > +take a while on large repositories. It provides significant performance gains > +for getting file based history logs with `git log` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This might be not entirely clear for someone that is not familiar with Git jargon. Perhaps it would better read as "for getting history of a directory or a file with `git log <path>`", or something like that. Side note: the sentence is missing its finishing full stop. > ++ > With the `--split` option, write the commit-graph as a chain of multiple > commit-graph files stored in `<dir>/info/commit-graphs`. The new commits > not already in the commit-graph are added in a new "tip" file. This file > diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c > index e0c6fc4bbf..9bd1e11161 100644 > --- a/builtin/commit-graph.c > +++ b/builtin/commit-graph.c > @@ -9,7 +9,7 @@ > > static char const * const builtin_commit_graph_usage[] = { > N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"), > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > + N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] <split options>"), > NULL > }; > > @@ -19,7 +19,7 @@ static const char * const builtin_commit_graph_verify_usage[] = { > }; > > static const char * const builtin_commit_graph_write_usage[] = { > - N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--[no-]progress] <split options>"), > + N_("git commit-graph write [--object-dir <objdir>] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] [--changed-paths] [--[no-]progress] <split options>"), > NULL > }; I was at first wondering why the duplication (not caused by your patch, though), and then realized that it is to have usage for command, and for individual subcommands, separately. > @@ -32,6 +32,7 @@ static struct opts_commit_graph { > int split; > int shallow; > int progress; > + int enable_bloom_filters; Why the field is called `enable_bloom_filters`, while option is called `--changed-paths`? I know it is not user-visible thing, so it would be easy to change if we ever go beyond Bloom filters, though... So I am not against keeping it as it is currently. > } opts; > > static int graph_verify(int argc, const char **argv) > @@ -110,6 +111,8 @@ static int graph_write(int argc, const char **argv) > N_("start walk at commits listed by stdin")), > OPT_BOOL(0, "append", &opts.append, > N_("include all commits already in the commit-graph file")), > + OPT_BOOL(0, "changed-paths", &opts.enable_bloom_filters, > + N_("enable computation for changed paths")), > OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")), > OPT_BOOL(0, "split", &opts.split, > N_("allow writing an incremental commit-graph file")), > @@ -143,6 +146,8 @@ static int graph_write(int argc, const char **argv) > flags |= COMMIT_GRAPH_WRITE_SPLIT; > if (opts.progress) > flags |= COMMIT_GRAPH_WRITE_PROGRESS; > + if (opts.enable_bloom_filters) > + flags |= COMMIT_GRAPH_WRITE_BLOOM_FILTERS; Minor nitpick: are we all right having this ordering of options (not for example having opt.progress last)? Disregarding this, it looks all right. > > read_replace_refs = 0; > > diff --git a/commit-graph.h b/commit-graph.h > index 7f5c933fa2..952a4b83be 100644 > --- a/commit-graph.h > +++ b/commit-graph.h > @@ -76,7 +76,8 @@ enum commit_graph_write_flags { > COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1), > COMMIT_GRAPH_WRITE_SPLIT = (1 << 2), > /* Make sure that each OID in the input is a valid commit OID. */ > - COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3) > + COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3), > + COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4) I wonder if we should add comment describing the flag, like for the one above... > }; > > struct split_commit_graph_opts { Best, -- Jakub Narębski