Re: [RFC PATCH] repo-settings: set defaults even when not in a repo

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 29 Mar 2022 11:04:18 +0200

On Wed, Mar 23 2022, Taylor Blau wrote:

> On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote:
>> On 3/23/2022 2:03 PM, Josh Steadmon wrote:
>> > prepare_repo_settings() initializes a `struct repository` with various
>> > default config options and settings read from a repository-local config
>> > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only
>> > in git repos), prepare_repo_settings was changed to issue a BUG() if it
>> > is called by a process whose CWD is not a Git repository. This approach
>> > was suggested in [1].
>> >
>> > This breaks fuzz-commit-graph, which attempts to parse arbitrary
>> > fuzzing-engine-provided bytes as a commit graph file.
>> > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but
>> > since we run the fuzz tests without a valid repository, we are hitting
>> > the BUG() from 44c7e62 for every test case.
>> >
>> > Fix this by refactoring prepare_repo_settings() such that it sets
>> > default options unconditionally; if its process is in a Git repository,
>> > it will also load settings from the local config. This eliminates the
>> > need for a BUG() when not in a repository.
>>
>> I think you have the right idea and this can work.
>
> Hmmm. To me this feels like bending over backwards in
> `prepare_repo_settings()` to accommodate one particular caller. I'm not
> necessarily opposed to it, but it does feel strange to make
> `prepare_repo_settings()` a noop here, since I would expect that any
> callers who do want to call `prepare_repo_settings()` are likely
> convinced that they are inside of a repository, and it probably should
> be a BUG() if they aren't.

I think adding that BUG() was overzelous in the first place, per
https://lore.kernel.org/git/211207.86r1apow9f.gmgdl@xxxxxxxxxxxxxxxxxxx/;

I don't see what purpose it solves to be this overly anal in this code,
and 44c7e62e51e (repo-settings: prepare_repo_settings only in git repos,
2021-12-06) just discusses "what" and not "why".

I think a perfectly fine solution to this is just to revert it:

	diff --git a/repo-settings.c b/repo-settings.c
	index b4fbd16cdcc..e162c1479bf 100644
	--- a/repo-settings.c
	+++ b/repo-settings.c
	@@ -18,7 +18,7 @@ void prepare_repo_settings(struct repository *r)
	 	int manyfiles;

	 	if (!r->gitdir)
	-		BUG("Cannot add settings for uninitialized repository");
	+		return;

	 	if (r->settings.initialized++)
	 		return;

I have that in my local integration branch, because I ended up wanting
to add prepare_repo_settings() to usage.c, which may or may not run
inside a repo (and maybe we'll have that config, maybe not).

But really, in common-main.c we do a initialize_the_repository(), so a
"struct repository" is already a thing we have before we get to the
"RUN_SETUP_GENTLY" or whatever in git.c, and a bunch of things all over
the place assume that it's the equivalent of { 0 }-initialized.

If we actually want to turn repository.[ch] into some strict API where
"Tho Shalt Not Use the_repository unless" we're actually in a repo
surely we should have it be NULL then, and to add that BUG() to the
likes of initialize_the_repository().

Except I think there's no point in that, and it would just lead to
pointless churn, so why do it for the settings in particular? Why can't
they just be { 0 }-init'd too?

If some caller cares about the distinction between r->settings being
like it is because of us actually having a repo, or us using the
defaults why can't they just check r->gitdir themselves?

For the rest the default of "just provide the defaults then" is a much
saner API.

I think *maybe* what this actually wanted to do was to bridge the gap
between "startup_info->have_repository" and a caller in builtin/ calling
prepare_repo_settings(), i.e. that it was a logic error to have that
RUN_SETUP_GENTLY caller do that.

I can see how that *might* be useful as some sanity assertion, but then
maybe we could add a more narrow BUG() just for that case, even having a
builtin_prepare_repo_settings() wrapper in builtin.h or whatever.