Re: [PATCH v3] commit-graph: refactor to avoid prepare_repo_settings

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 23 Jun 2022 14:59:20 -0700

Josh Steadmon <steadmon@xxxxxxxxxx> writes:

> This series of changes broke fuzz-commit-graph, which attempts to parse
> arbitrary fuzzing-engine-provided bytes as a commit graph file.
> commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
> since we run the fuzz tests without a valid repository, we are hitting
> the BUG() from 44c7e62 for every test case.
>
> Fix this by moving the majority of the implementaiton of
> `parse_commit_graph()` into a new function,
> `parse_commit_graph_settings()` that accepts a repo_settings pointer.
> This allows fuzz-commit-graph to continue to test the commit-graph
> parser implementation without relying on prepare_repo_settings().

It sounds like this is not a "fix" but a workaround to bend the
production code so that a non-production test shim can be inserted
more easily.

I am OK with the idea, but have a huge problem with the new name.

Is it just me who thinks parse_commit_graph_settings() is a function
that parses some kind of settings that affects the way the commit
graph gets used or written?

Stepping back a bit, why can't fuzz-commit-graph prepare a
repository object that looks sufficiently real?  Something along the
lines of...

                struct repository fake_repo;

                fake_repo.settings.initialized = 1;
                fake_repo.gitdir = ".";
                parse_commit_graph(&fake_repo, (void *)data, size);
		...

Also, I feel somewhat uneasy to see these changes:

> -	if (get_configured_generation_version(r) >= 2) {
> +	if (s->commit_graph_generation_version >= 2) {
> -	if (r->settings.commit_graph_read_changed_paths) {
> +	if (s->commit_graph_read_changed_paths) {
> -	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> +	ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2);
>  	ctx->num_generation_data_overflows = 0;

that makes the production code bend over backwards to _avoid_
referencing 'r', only to cater to the test shim.  That's an
artificial limitation we are forcing on our developers who works on
this code.  It might be that what is in the repository settings is
sufficient for today's code to work, but I do not think needs for
fuzz tests should tie the hand of this production code by forbidding
it to look at other things in the repository in the future.  After
all, tests are to serve the production code, not the other way
around.

On the other hand, I think a change that is slightly smaller than
the posted patch, which justifies itself with a completely different
rationale, would be totally acceptable.  You can justify this change
with NO mention of fuzzers.

    The parse_commit_graph() function takes a "struct repository *"
    pointer, but all it cares about is the .settings member of it.

    Update the function and all its existing callers so that it
    takes "struct repo_settings *" instead.

Now, in the future, some developers _might_ find it necessary to
access stuff other than the repository settings to validate the
contents of the graph file, and we may need to change it to take a
full repository structure again.  The test should adjust to such
needs of the production code, not the other way around.  But until
then...

Thanks.