Re: [PATCH 0/6] [RFC] config.c: use struct for config reading state

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 07 Mar 2023 19:36:58 +0100

On Tue, Mar 07 2023, Glen Choo wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> On Wed, Mar 01 2023, Glen Choo via GitGitGadget wrote:
>>
>>> This series extracts the global config reading state into "struct
>>> config_reader" and plumbs it through the config reading machinery. It's very
>>> similar to how we've plumbed "struct repository" and other 'context objects'
>>> in the past, except:
>>>
>>>  * The global state (named "the_reader") for the git process lives in a
>>>    config.c static variable, and not on "the_repository". See 3/6 for the
>>>    rationale.
>>
>> I agree with the overall direction, but don't think that rationale in
>> 3/6 is sufficient to go in this "the_reader" direction, as opposed to
>> sticking with and extending "the_repository" approach.
>>
>> For orthagonal reasons (getting rid of some of the API duplication) I've
>> been carrying a patch to get rid of the "configset" part of the *public*
>> API, i.e. to have API users always use the "repo_config_*()" or
>> "git_config_*()" variants, that patch is at:
>> https://github.com/avar/git/commit/0233297a359bbda43a902dd0213aacdca82faa34
>
> Those patches are probably worth sending, even if only as RFC. I found
> it pretty hard to draft a substantial response without effectively doing
> a full review of the patch.

Yes, sorry. It's part of some changes on top of my outstanding config
API changes (just re-rolled at
https://lore.kernel.org/git/cover-v6-0.9-00000000000-20230307T180516Z-avarab@xxxxxxxxx/). Hopefully
those will land soon after the upcoming release, I'll try to submit this
(along with related changes) soon afterwards.

>> It's a bit distasteful, but that change argues that just mocking up a
>> "struct repository" with a "config" member pointing to a new
>> configset is better than maintaining an entirely different API just
>> for those cases where we need to parse a one-off file or whatever.
>>
>> I think that going in that direction neatly solves the issues you're
>> noting here and in your 3/6, i.e. we'd always have this in the "repo"
>> object, so we'd just stick the persistent "reader" variables in the
>> "struct repository"'s "config" member.
>
> If I understand your proposal correctly, we would move the config
> variables to the_repository. Then, any time a caller would like to work
> with an individual file, it would init a new "struct repository" with a
> clean set of config members (using repo_init_repo_blank_config() or
> something) and reuse the repo_config_* API?

It's certainly a hack, but so is introducing a new "the_reader"
singleton whose lifetime we need to manage seperately from
"the_repository" in the common case :)

I think a better argument for this is probably that if you try to change
repository.h so that we define "struct repository" thusly (The
"hash_algo" field being still there due to a very common macro):

	struct repository {
	        const struct git_hash_algo *hash_algo;
	        struct config_set *config;
	};

And then try to:

	make config.o

You'll get:

	$ make config.o
	    CC config.o
	config.c: In function ‘include_by_branch’:
	config.c:311:46: error: ‘struct repository’ has no member named ‘gitdir’
	  311 |         const char *refname = !the_repository->gitdir ?
	      |                                              ^~
	config.c: In function ‘repo_read_config’:
	config.c:2523:30: error: ‘struct repository’ has no member named ‘commondir’
	 2523 |         opts.commondir = repo->commondir;
	      |                              ^~
	config.c:2524:28: error: ‘struct repository’ has no member named ‘gitdir’
	 2524 |         opts.git_dir = repo->gitdir;
	      |                            ^~
	make: *** [Makefile:2719: config.o] Error 1

I.e. almost all of the config code doesn't care about the repository at
all, it just needs the "struct config_set" that's in the repository
struct.

With the linked-to change I'm arguing that just mocking it up sucks less
than carrying a duplicate set of API functions just for those rare cases
where we need to one-off read a config file.

But...

> It is a workable solution, e.g. that approach would work around the
> failures in test-tool and scalar that I observed. In the spirit of
> libification, this feels like a kludge, though, since we'd be reverting
> to using "struct repository" for more things instead of using more
> well-scoped interfaces. IMO a better future for the config_set API would
> be to move it into configset.c or something, where only users who want
> the low level API would use it and everyone else would just pretend it
> doesn't exist. This would be a little like libgit2's organization, where
> 'general config', 'config parsing' and 'in-memory config value
> representations' are separate files, e.g.
>
>   https://github.com/libgit2/libgit2/blob/main/src/libgit2/config.h
>   https://github.com/libgit2/libgit2/blob/main/src/libgit2/config_parse.h
>   https://github.com/libgit2/libgit2/blob/main/src/libgit2/config_entries.h
>
> I also hesitate to put the config variables on the_repository, because
> in the long term, I think "struct config_reader" can and should be
> purely internal to config.c. But if we start advertising its existence
> via the_repository, that might be an invitation to (ab)use that API and
> make that transition harder.

...yes it's a kludge, but I'd think if you're interested in more
generally fixing it that a better use of time would be to narrowly focus
on those cases where we don't have a "repository" now, i.e. the
configset API users my linked-to patch shows & amends.

Everything else that uses git_config_get_*(), repo_config_get_*() will
either use an already set up "the_repository", or lazily init it, see
the git_config_check_init() API users.

Or, we could have repo_config() and friends not take a "struct
repository", but another config-specific object which has the subset of
the three fields from that struct which it actually needs (config,
gitdir & commondir).

The mocking function I added in the linked-to commit was just a way of
getting to that via the shortest means.