Re: [RFC PATCH] repo-settings: set defaults even when not in a repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
> > prepare_repo_settings() initializes a `struct repository` with various
> > default config options and settings read from a repository-local config
> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
> > is called by a process whose CWD is not a Git repository. This approach
> > was suggested in [1].
> >
> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
> > fuzzing-engine-provided bytes as a commit graph file.
> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> > since we run the fuzz tests without a valid repository, we are hitting
> > the BUG() from 44c7e62 for every test case.
> >
> > Fix this by refactoring prepare_repo_settings() such that it sets
> > default options unconditionally; if its process is in a Git repository,
> > it will also load settings from the local config. This eliminates the
> > need for a BUG() when not in a repository.
>
> I think you have the right idea and this can work.

Hmmm. To me this feels like bending over backwards in
`prepare_repo_settings()` to accommodate one particular caller. I'm not
necessarily opposed to it, but it does feel strange to make
`prepare_repo_settings()` a noop here, since I would expect that any
callers who do want to call `prepare_repo_settings()` are likely
convinced that they are inside of a repository, and it probably should
be a BUG() if they aren't.

I was initially thinking that Josh's alternative of introducing and
calling a lower-level version of `prepare_commit_graph()` that doesn't
require the use of any repository settings would make sense.

But when I started looking at implementing it, I became confused at how
this is supposed to work at all without using a repository. We depend on
the values parsed from:

  - commitGraph.generationVersion
  - commitGraph.readChangedPaths

to determine which part(s) of the commit graph we do and don't read.

Looking around, I think I probably inadvertently broke this in
ab14d0676c (commit-graph: pass a 'struct repository *' in more places,
2020-09-09). But prior to ab14d0676c, neither of those settings existed,
so parsing the commit graph was a pure function of the commit graph's
contents alone, and didn't rely on the existence of a repository.

We could pretend as if `commitGraph.generationVersion` is always "2" and
`commitGraph.readChangedPaths` is always "true", and I think that would
still give us good-enough coverage.

I assume that libFuzzer doesn't give us a convenient way to stub out a
repository. Maybe we should have a variant of `parse_commit_graph` that
instead takes a `struct repo_settings` filled out with the defaults?

We could use that to teach libFuzzer about additional states that the
commit graph parser can be in, too (though probably outside the scope of
this patch).

I tried to sketch all of this out, which seems to work. Let me know what
you think:

--- 8< ---

diff --git a/commit-graph.c b/commit-graph.c
index adffd020dd..3d5aa536c2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -96,13 +96,6 @@ define_commit_slab(commit_graph_data_slab, struct commit_graph_data);
 static struct commit_graph_data_slab commit_graph_data_slab =
 	COMMIT_SLAB_INIT(1, commit_graph_data_slab);

-static int get_configured_generation_version(struct repository *r)
-{
-	int version = 2;
-	repo_config_get_int(r, "commitgraph.generationversion", &version);
-	return version;
-}
-
 uint32_t commit_graph_position(const struct commit *c)
 {
 	struct commit_graph_data *data =
@@ -335,6 +328,13 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,

 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size)
+{
+	prepare_repo_settings(r);
+	return parse_commit_graph_settings(&r->settings, graph_map, graph_size);
+}
+
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+						 void *graph_map, size_t graph_size)
 {
 	const unsigned char *data;
 	struct commit_graph *graph;
@@ -371,8 +371,6 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 		return NULL;
 	}

-	prepare_repo_settings(r);
-
 	graph = alloc_commit_graph();

 	graph->hash_len = the_hash_algo->rawsz;
@@ -402,7 +400,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
 	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);

-	if (get_configured_generation_version(r) >= 2) {
+	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
 		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
@@ -412,7 +410,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 			graph->read_generation_data = 1;
 	}

-	if (r->settings.commit_graph_read_changed_paths) {
+	if (s->commit_graph_read_changed_paths) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
@@ -2299,7 +2297,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
 	ctx->opts = opts;
 	ctx->total_bloom_filter_data_size = 0;
-	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
+	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
 	ctx->num_generation_data_overflows = 0;

 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
diff --git a/commit-graph.h b/commit-graph.h
index 2e3ac35237..1fdca7a927 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -95,6 +95,8 @@ struct commit_graph *read_commit_graph_one(struct repository *r,
 					   struct object_directory *odb);
 struct commit_graph *parse_commit_graph(struct repository *r,
 					void *graph_map, size_t graph_size);
+struct commit_graph *parse_commit_graph_settings(struct repo_settings *s,
+					void *graph_map, size_t graph_size);

 /*
  * Return 1 if and only if the repository has a commit-graph
diff --git a/fuzz-commit-graph.c b/fuzz-commit-graph.c
index e7cf6d5b0f..07ef4643d8 100644
--- a/fuzz-commit-graph.c
+++ b/fuzz-commit-graph.c
@@ -11,7 +11,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	struct commit_graph *g;

 	initialize_the_repository();
-	g = parse_commit_graph(the_repository, (void *)data, size);
+	g = parse_commit_graph_settings(the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);

diff --git a/repo-settings.c b/repo-settings.c
index b4fbd16cdc..a7e5d10b27 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -10,6 +10,13 @@ static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
 		*dest = def;
 }

+static void repo_cfg_int(struct repository *r, const char *key, int *dest,
+			 int def)
+{
+	if (repo_config_get_int(r, key, dest))
+		*dest = def;
+}
+
 void prepare_repo_settings(struct repository *r)
 {
 	int experimental;
@@ -43,6 +50,7 @@ void prepare_repo_settings(struct repository *r)

 	/* Boolean config or default, does not cascade (simple)  */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
+	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
 	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
diff --git a/repository.h b/repository.h
index e29f361703..6c40ea3f1b 100644
--- a/repository.h
+++ b/repository.h
@@ -29,6 +29,7 @@ struct repo_settings {
 	int initialized;

 	int core_commit_graph;
+	int commit_graph_generation_version;
 	int commit_graph_read_changed_paths;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;

--- >8 ---

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux