Re: [PATCH 0/6] [RFC] config.c: use struct for config reading state

Glen Choo <chooglen@xxxxxxxxxx> · Wed, 08 Mar 2023 15:09:52 -0800

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:

> On Mon, Mar 06 2023, Jonathan Tan wrote:
>
>> Glen Choo <chooglen@xxxxxxxxxx> writes:
>>> By configset interface, I believe you mean the O(1) lookup functions
>>> like git_config_get_int() (which rely on the value being cached, but
>>> don't necessarily accept "struct config_set" as an arg)? I think that
>>> makes sense both from a performance and maintenance perspective.
>>
>> Ah, yes. (More precisely, not the one where you call something like
>> repo_config(), passing a callback function that.)
>>
>>> Given how painful it is to change the config_fn_t signature, I think it
>>> is important to get as right as possible the first time. After I sent
>>> this out, I thought of yet another possible config_fn_t signature
>>> (since most callbacks only need diagnostic information, we could pass
>>> "struct key_value_info" instead of the more privileged "struct
>>> config_reader"), but given how many functions we'd have to change, it
>>> seemed extremely difficult to even begin experimenting with this
>>> different signature.
>>
>> Yeah, the first change is the hardest. I think passing it a single
>> struct (so, instead of key, value, data, reader, and then any future
>> fields we would need) to which we can add fields later would mean that
>> we wouldn't need any changes beyond the first, though.
>
> For the configset API users we already have the line number, source
> etc. in the "util" member, i.e. when we have an error in any API user
> that uses the configset they can error about the specific line that
> config came from.

Yeah, and I think we should plumb the "util" (actually "struct
key_value_info") back to the users who need that diagnostic info, which
achieves the same purpose as plumbing the "config_reader" to the
callback functions (since the callback functions only need to read
diagnostic information), but it's better since it exposes fewer gory
details and we can refactor those using coccinelle. The difficult part
is replacing the callbacks with the configset API in the first place.

> I think this may have been conflated because e.g. for the configset to
> get the "scope" we need to go from do_git_config_sequence(), which will
> currently set "current_parsing_scope", all the way down to
> configset_add_value(), and there we'll make use of the
> "config_set_callback", which is a config_fn_t.

I think this is half-right (at least from a historical perspective). We
started by just reading auxiliary info like line number and file name
from the "current config source", but when we started caching values in
config sets, we had to find another way to share this auxiliary info.
The solution that 0d44a2dacc (config: return configset value for
current_config_ functions, 2016-05-26) gave us is to also cache the
auxiliary info, and then when we iterate the configset, we set the
global "current_config_kvi" to the cached value.

So they aren't conflated per se, since the reason for their existence is
so that API users don't have to care whether or not they are iterating a
file or iterating a configset. But as you've observed...

> But that's all internal "static" functions, except
> git_config_from_file() and git_config_from_file_with_options(), but
> those have only a handful of callers.
>
> But that's *different* than the user callbacks, which will be invoked
> through a loop in configset_iter(), i.e. *after* we've parsed the
> config, and are just getting the line number, scope etc. from the
> configset.

in practice, very few users read (and should be reading) config directly
from files. Most want the 'config for the whole repo' (which is handled
by the repo_* and git_* functions) and others will explicitly call
git_config_from_file*() so I don't think the config API needs to keep
users ignorant of 'whether the current config is cached or not'.

We could convert callbacks to the configset API, and if we then we plumb
the key_value_info to the configset API users, maybe we can retire
"current_config_kvi".

> There's other edge cases, e.g. current_config_line() will access the
> global, but it only has two callers (one if we exclude the test
> helper). But I think the answer there is to change the
> config_read_push_default() code, not to give every current "config_fn_t"
> implementation an extra parameter.

I agree that we should rewrite config_read_push_default(), but wouldn't
we still need to expose auxiliary information (line number, scope) to
users of the callback API? e.g. config.c:die_bad_number() uses the file
name [*], and builtin/config.c definitely needs it for 'git config -l'.
Also, an express purpose for this series is to prepare
git_config_from_file() to be used by callers out-of-tree, which would
definitely need that information.

I'm not sure if you're proposing to move state from "the_reader" to
something like "the_repository.config_state". I'd hesitate to take that
approach, since we're just swapping one global for another.

[*] config.c:die_bad_number() is actually a bit broken because it
doesn't use the cached kvi info from the config set. I'll probably send
a fixup patch to fix this.