Re: [PATCH v3 14/14] builtin/commit-graph.c: introduce '--max-new-filters=<n>'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 11, 2020 at 04:52:14PM -0400, Taylor Blau wrote:
> Introduce a command-line flag and configuration variable to fill in the
> 'max_new_filters' variable introduced by the previous patch.
> 
> The command-line option '--max-new-filters' takes precedence over
> 'commitGraph.maxNewFilters', which is the default value.
> '--no-max-new-filters' can also be provided, which sets the value back
> to '-1', indicating that an unlimited number of new Bloom filters may be
> generated. (OPT_INTEGER only allows setting the '--no-' variant back to
> '0', hence a custom callback was used instead).
> 
> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  Documentation/config/commitgraph.txt |  4 +++
>  Documentation/git-commit-graph.txt   |  4 +++
>  bloom.c                              | 15 +++++++++++
>  builtin/commit-graph.c               | 39 +++++++++++++++++++++++++---
>  commit-graph.c                       | 27 ++++++++++++++++---
>  commit-graph.h                       |  1 +
>  t/t4216-log-bloom.sh                 | 19 ++++++++++++++
>  7 files changed, 102 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt
> index cff0797b54..4582c39fc4 100644
> --- a/Documentation/config/commitgraph.txt
> +++ b/Documentation/config/commitgraph.txt
> @@ -1,3 +1,7 @@
> +commitGraph.maxNewFilters::
> +	Specifies the default value for the `--max-new-filters` option of `git
> +	commit-graph write` (c.f., linkgit:git-commit-graph[1]).
> +
>  commitGraph.readChangedPaths::
>  	If true, then git will use the changed-path Bloom filters in the
>  	commit-graph file (if it exists, and they are present). Defaults to
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 17405c73a9..9c887d5d79 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -67,6 +67,10 @@ this option is given, future commit-graph writes will automatically assume
>  that this option was intended. Use `--no-changed-paths` to stop storing this
>  data.
>  +
> +With the `--max-new-filters=<n>` option, generate at most `n` new Bloom
> +filters (if `--changed-paths` is specified). If `n` is `-1`, no limit is
> +enforced. Overrides the `commitGraph.maxNewFilters` configuration.
> ++
>  With the `--split[=<strategy>]` option, write the commit-graph as a
>  chain of multiple commit-graph files stored in
>  `<dir>/info/commit-graphs`. Commit-graph layers are merged based on the
> diff --git a/bloom.c b/bloom.c
> index ed54e96e57..8d07209c6b 100644
> --- a/bloom.c
> +++ b/bloom.c
> @@ -51,6 +51,21 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
>  	else
>  		start_index = 0;
>  
> +	if ((start_index == end_index) &&
> +	    (g->bloom_large && !bitmap_get(g->bloom_large, lex_pos))) {
> +		/*
> +		 * If the filter is zero-length, either (1) the filter has no
> +		 * changes, (2) the filter has too many changes, or (3) it
> +		 * wasn't computed (eg., due to '--max-new-filters').
> +		 *
> +		 * If either (1) or (2) is the case, the 'large' bit will be set
> +		 * for this Bloom filter. If it is unset, then it wasn't
> +		 * computed. In that case, return nothing, since we don't have
> +		 * that filter in the graph.
> +		 */
> +		return 0;
> +	}
> +
>  	filter->len = end_index - start_index;
>  	filter->data = (unsigned char *)(g->chunk_bloom_data +
>  					sizeof(unsigned char) * start_index +
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 38f5f57d15..3500a6e1f1 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -13,7 +13,8 @@ static char const * const builtin_commit_graph_usage[] = {
>  	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
>  	N_("git commit-graph write [--object-dir <objdir>] [--append] "
>  	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
> -	   "[--changed-paths] [--[no-]progress] <split options>"),
> +	   "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] "
> +	   "<split options>"),
>  	NULL
>  };
>  
> @@ -25,7 +26,8 @@ static const char * const builtin_commit_graph_verify_usage[] = {
>  static const char * const builtin_commit_graph_write_usage[] = {
>  	N_("git commit-graph write [--object-dir <objdir>] [--append] "
>  	   "[--split[=<strategy>]] [--reachable|--stdin-packs|--stdin-commits] "
> -	   "[--changed-paths] [--[no-]progress] <split options>"),
> +	   "[--changed-paths] [--[no-]max-new-filters <n>] [--[no-]progress] "
> +	   "<split options>"),
>  	NULL
>  };
>  
> @@ -162,6 +164,23 @@ static int read_one_commit(struct oidset *commits, struct progress *progress,
>  	return 0;
>  }
>  
> +static int write_option_max_new_filters(const struct option *opt,
> +					const char *arg,
> +					int unset)
> +{
> +	int *to = opt->value;
> +	if (unset)
> +		*to = -1;
> +	else {
> +		const char *s;
> +		*to = strtol(arg, (char **)&s, 10);
> +		if (*s)
> +			return error(_("%s expects a numerical value"),
> +				     optname(opt, opt->flags));
> +	}
> +	return 0;
> +}
> +
>  static int graph_write(int argc, const char **argv)
>  {
>  	struct string_list pack_indexes = STRING_LIST_INIT_NODUP;
> @@ -197,6 +216,9 @@ static int graph_write(int argc, const char **argv)
>  			N_("maximum ratio between two levels of a split commit-graph")),
>  		OPT_EXPIRY_DATE(0, "expire-time", &write_opts.expire_time,
>  			N_("only expire files older than a given date-time")),
> +		OPT_CALLBACK_F(0, "max-new-filters", &write_opts.max_new_filters,
> +			NULL, N_("maximum number of changed-path Bloom filters to compute"),
> +			0, write_option_max_new_filters),
>  		OPT_END(),
>  	};
>  
> @@ -205,6 +227,7 @@ static int graph_write(int argc, const char **argv)
>  	write_opts.size_multiple = 2;
>  	write_opts.max_commits = 0;
>  	write_opts.expire_time = 0;
> +	write_opts.max_new_filters = -1;
>  
>  	trace2_cmd_mode("write");
>  
> @@ -270,6 +293,16 @@ static int graph_write(int argc, const char **argv)
>  	return result;
>  }
>  
> +static int git_commit_graph_config(const char *var, const char *value, void *cb)
> +{
> +	if (!strcmp(var, "commitgraph.maxnewfilters")) {
> +		write_opts.max_new_filters = git_config_int(var, value);
> +		return 0;
> +	}
> +
> +	return git_default_config(var, value, cb);
> +}
> +
>  int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  {
>  	static struct option builtin_commit_graph_options[] = {
> @@ -283,7 +316,7 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
>  		usage_with_options(builtin_commit_graph_usage,
>  				   builtin_commit_graph_options);
>  
> -	git_config(git_default_config, NULL);
> +	git_config(git_commit_graph_config, &opts);
>  	argc = parse_options(argc, argv, prefix,
>  			     builtin_commit_graph_options,
>  			     builtin_commit_graph_usage,
> diff --git a/commit-graph.c b/commit-graph.c
> index 6886f319a5..4aae5471e3 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -954,7 +954,8 @@ struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
>  }
>  
>  static int get_bloom_filter_large_in_graph(struct commit_graph *g,
> -					   const struct commit *c)
> +					   const struct commit *c,
> +					   uint32_t max_changed_paths)
>  {
>  	uint32_t graph_pos = commit_graph_position(c);
>  	if (graph_pos == COMMIT_NOT_FROM_GRAPH)
> @@ -965,6 +966,17 @@ static int get_bloom_filter_large_in_graph(struct commit_graph *g,
>  
>  	if (!(g && g->bloom_large))
>  		return 0;
> +	if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) {
> +		/*
> +		 * Force all commits which are subject to a different
> +		 * 'max_changed_paths' limit to be recomputed from scratch.
> +		 *
> +		 * Note that this could likely be improved, but is ignored since
> +		 * all real-world graphs set the maximum number of changed paths
> +		 * at 512.
> +		 */
> +		return 0;
> +	}
>  	return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base);
>  }
>  
> @@ -1470,6 +1482,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>  	int i;
>  	struct progress *progress = NULL;
>  	int *sorted_commits;
> +	int max_new_filters;
>  
>  	init_bloom_filters();
>  	ctx->bloom_large = bitmap_word_alloc(ctx->commits.nr / BITS_IN_EWORD + 1);
> @@ -1486,10 +1499,15 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>  		ctx->order_by_pack ? commit_pos_cmp : commit_gen_cmp,
>  		&ctx->commits);
>  
> +	max_new_filters = ctx->opts->max_new_filters >= 0 ?
> +		ctx->opts->max_new_filters : ctx->commits.nr;
> +
>  	for (i = 0; i < ctx->commits.nr; i++) {
>  		int pos = sorted_commits[i];
>  		struct commit *c = ctx->commits.list[pos];
> -		if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph, c)) {
> +		if (get_bloom_filter_large_in_graph(ctx->r->objects->commit_graph,
> +						    c,
> +						    ctx->bloom_settings->max_changed_paths)) {
>  			bitmap_set(ctx->bloom_large, pos);
>  			ctx->count_bloom_filter_known_large++;
>  		} else {
> @@ -1497,7 +1515,7 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>  			struct bloom_filter *filter = get_or_compute_bloom_filter(
>  				ctx->r,
>  				c,
> -				1,
> +				ctx->count_bloom_filter_computed < max_new_filters,
>  				ctx->bloom_settings,
>  				&computed);
>  			if (computed) {
> @@ -1507,7 +1525,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
>  					ctx->count_bloom_filter_found_large++;
>  				}
>  			}
> -			ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
> +			if (filter)
> +				ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
>  		}
>  		display_progress(progress, i + 1);
>  	}
> diff --git a/commit-graph.h b/commit-graph.h
> index af08c4505d..75ef83708b 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -114,6 +114,7 @@ struct commit_graph_opts {
>  	int max_commits;
>  	timestamp_t expire_time;
>  	enum commit_graph_split_flags flags;
> +	int max_new_filters;
>  };
>  
>  /*
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 6859d85369..3aab8ffbe3 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -286,4 +286,23 @@ test_expect_success 'Bloom generation does not recompute too-large filters' '
>  	)
>  '
>  
> +test_expect_success 'Bloom generation is limited by --max-new-filters' '
> +	(
> +		cd limits &&
> +		test_commit c2 filter &&
> +		test_commit c3 filter &&
> +		test_commit c4 no-filter &&
> +		test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=2" \
> +			2 0 2
> +	)
> +'
> +
> +test_expect_success 'Bloom generation backfills previously-skipped filters' '
> +	(
> +		cd limits &&
> +		test_bloom_filters_computed "--reachable --changed-paths --split=replace --max-new-filters=1" \
> +			2 0 1
> +	)
> +'
> +
>  test_done
> -- 
> 2.28.0.rc1.13.ge78abce653

Something seems to be wrong in this patch, though I haven't looked
closer.  Consider this test with a bit of makeshift tracing:

  ---  >8  ---

diff --git a/bloom.c b/bloom.c
index 8d07209c6b..1a0dec35cd 100644
--- a/bloom.c
+++ b/bloom.c
@@ -222,6 +222,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	if (!compute_if_not_present)
 		return NULL;
 
+	printf("get_or_compute_bloom_filter(): diff: %s\n", oid_to_hex(&c->object.oid));
 	repo_diff_setup(r, &diffopt);
 	diffopt.flags.recursive = 1;
 	diffopt.detect_rename = 0;
diff --git a/t/t9999-test.sh b/t/t9999-test.sh
new file mode 100755
index 0000000000..0833e6ff7e
--- /dev/null
+++ b/t/t9999-test.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description='test'
+
+. ./test-lib.sh
+
+test_expect_success 'test' '
+	for i in 1 2 3 4 5 6
+	do
+		git commit -q --allow-empty -m $i || return 1
+	done &&
+	git log --oneline &&
+
+	# We have 6 commits and compute 2 Bloom filters per execution,
+	# so 3 executions should be enough...  but, alas, it isnt.
+	for i in 1 2 3 4 5
+	do
+		# No --split=replace!
+		git commit-graph write --reachable --changed-paths --max-new-filters=2 || return 1
+	done &&
+
+	git commit-graph write --reachable --changed-paths --max-new-filters=4
+'
+
+test_done

  ---  >8  ---

It's output looks like:

  [...]
  + git log --oneline
  13fcefa (HEAD -> master) 6
  0c71516 5
  a82a61c 4
  54c6b2c 3
  fc99def 2
  a3a8cd3 1
  + git commit-graph write --reachable --changed-paths --max-new-filters=2
  get_or_compute_bloom_filter(): diff: 0c71516945bf0a23813e80205961d29ebc1020e0
  get_or_compute_bloom_filter(): diff: 13fcefa4bb859a15c4edc6bb01b8b6c91b4f32b6
  + git commit-graph write --reachable --changed-paths --max-new-filters=2
  get_or_compute_bloom_filter(): diff: 54c6b2cd4fb50066683a197cc6d677689618505a
  get_or_compute_bloom_filter(): diff: a3a8cd3c82028671bf51502d77277baf14a2f528
  + git commit-graph write --reachable --changed-paths --max-new-filters=2
  get_or_compute_bloom_filter(): diff: 0c71516945bf0a23813e80205961d29ebc1020e0
  get_or_compute_bloom_filter(): diff: 13fcefa4bb859a15c4edc6bb01b8b6c91b4f32b6
  + git commit-graph write --reachable --changed-paths --max-new-filters=2
  get_or_compute_bloom_filter(): diff: 54c6b2cd4fb50066683a197cc6d677689618505a
  get_or_compute_bloom_filter(): diff: a3a8cd3c82028671bf51502d77277baf14a2f528
  + git commit-graph write --reachable --changed-paths --max-new-filters=2
  get_or_compute_bloom_filter(): diff: 0c71516945bf0a23813e80205961d29ebc1020e0
  get_or_compute_bloom_filter(): diff: 13fcefa4bb859a15c4edc6bb01b8b6c91b4f32b6
  + git commit-graph write --reachable --changed-paths
  get_or_compute_bloom_filter(): diff: 54c6b2cd4fb50066683a197cc6d677689618505a
  get_or_compute_bloom_filter(): diff: a3a8cd3c82028671bf51502d77277baf14a2f528
  get_or_compute_bloom_filter(): diff: a82a61c79b2b07c4440e292613e11a69e33ef7a2
  get_or_compute_bloom_filter(): diff: fc99def8b1df27bcab7d1f4b7ced73239f9bd7ec

See how the third write with '--max-new-filters=2' computes the
filters that have already been computed by the first write instead of
those two that have never been computed?  And then how the fourth
write computes filters that have already been computed by the second
write?  And ultimately we'll need a write without '--max-new-filters' (or
with '--max-new-filters=<large-enough>') to compute all remaining
filters.

With '--split=replace' it appears to work as expected.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux