Re: [PATCH v4 0/5] config: introduce discovery.bare and protected config

Glen Choo <chooglen@xxxxxxxxxx> · Mon, 27 Jun 2022 10:34:36 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Glen Choo <chooglen@xxxxxxxxxx> writes:
>
>> Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
>>
>>> "Glen Choo via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>>>> Glen Choo (5):
>>>>   Documentation/git-config.txt: add SCOPES section
>>>>   Documentation: define protected configuration
>>>
>>> Forgot to mention when I was sending my comments on patch 2: we should
>>> standardize on "protected config" and not use "protected configuration"
>>> anywhere.
>>
>> Makes sense.
>
> Using a single word consistently does make sense, but why favor a
> non-word over a proper word ;-)?

Hm, I guess there's an argument that "config" is a term of art that
specifically refers to things from "git config". From that lens, it's much less
confusing to see the CONFIGURATION section in
Documentation/git-config.txt. But the argument is a little flimsy
because I don't think that's something we've stuck to anywhere.

I'll use "configuration" if it's not too unwieldy.

>> I suppose that the idea behind this is that we only parse and store each
>> config file exactly once. It's a good goal, but the whole point of the
>> configset is that we can query a single struct to figure out the value
>> of a config variable. Having multiple configsets starts to shift more of
>> the burden to the callers because they now have to query multiple
>> configsets to find their desired config value, and we already start to
>> see some of this unpleasantness in this series.
>
> Yes, I was worried about this, too.  "parse and store exactly once"
> may merely be a performance thing, but it still matters, even though
> it is not worse than making duplicate callbacks to overwrite globals
> that have been already set earlier, which will affect correctness ;-)

Exactly.

>> An alternative that I'd been thinking about is to make a few changes to
>> the git_config_* + configset API to allow us to use a single configset
>> for all of our needs:
>>
>> 1. Keep track of what config we've read when reading into
>>    the_repository->config, i.e. instead of a boolean "all config has
>>    been [un]read", we can express "system and global config has been
>>    read, but not local or command config". Then, use this information to
>>    load config from sources as they become available. This will allow us
>>    to read incomplete config for trace2 and setup.c (discovery.bare and
>>    safe.directory), and only read what we need later on.
>
> That is not a bad direction to go, but are we sure that we always
> read in the right order (and there is one single right order) and
> stop at the right step?
>
> config.c::do_git_config_sequence() reads the system and then the
> global before the local, the worktree, and the command line.  We
> would allow the values of "protected" configuration variables to be
> inspected by stopping after the first two and inspecting the result
> before the local and the rest overrides them, but will we need
> *only* that kind of partial configuration reading that stops exactly
> there?  Even with the proposed "protected" scheme, I thought we plan
> to honor the command line ones, so we may need to read
> system+global+command without reading anything else to grab the
> values only from the protected sources (ah, I like the application
> of the adjective "protected" to the source, not variables, because
> that is what we are really talking about---alternatively we could
> call it "safe").  But if we later read local and worktree ones
> lazily, unless we _insert_ them before what we read from the command
> line, we'll break the last-one-wins property, so we need to be
> careful.  I guess each configuration value in the configset knows
> where it came from, so it probably is possible to insert the ones
> you read lazily later in the right spot.

Yeah, last-one-wins makes this a lot trickier. I thought that it would
be nice to have insert-with-priority because that also eliminates some
of the correctness concerns in this series, i.e. that ensures protected
config has the same priority as regular config, but that's a bigger
undertaking and I'm not certain about the performance.

>> 2. Add an additional argument that specifies what scopes to respect when
>>    reading config (maybe as a set of flags). This gives us extra
>>    specificity when using the git_config*() functions, so we could get
>>    rid of git_protected_config() like so:
>>
>>     /* Change enum config_scope into flags first... */
>>
>>     #define WIP_SCOPES_PROTECTED = CONFIG_SCOPE_SYSTEM & \
>>       CONFIG_SCOPE_GLOBAL & CONFIG_SCOPE_COMMAND
>>
>>     static enum discovery_bare_allowed get_discovery_bare(void)
>>     {
>>       enum discovery_bare_allowed result = DISCOVERY_BARE_ALWAYS;
>>       git_config(discovery_bare_cb, &result, WIP_SCOPES_PROTECTED);
>>       return result;
>>     }
>
> Alternatively, we could make the callback aware of the scope for
> each var-value it is called and have it filter, but that would be a
> bigger surgery.
>
> I think a new iterator git_config_in_scope(), instead of updating
> git_config(), would make sense.  By definition, all existing
> git_config() callers do not need the scope specifiers, and
> "protected" may be the first one but will not be the last one that
> needs to read from particular scopes.

Makes sense. The signature of git_config() could stay the same, but we
could refactor it to use git_config_in_scope().