Re: [PATCH 4/5] config: return an empty list, not NULL

Derrick Stolee <derrickstolee@xxxxxxxxxx> · Wed, 28 Sep 2022 09:46:48 -0400

On 9/27/22 3:18 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Sep 27 2022, Derrick Stolee wrote:
> 
>> On 9/27/2022 12:21 PM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> On Tue, Sep 27 2022, Derrick Stolee via GitGitGadget wrote:
>>
>>>>  /**
>>>>   * Finds and returns the value list, sorted in order of increasing priority
>>>>   * for the configuration variable `key`. When the configuration variable
>>>> - * `key` is not found, returns NULL. The caller should not free or modify
>>>> - * the returned pointer, as it is owned by the cache.
>>>> + * `key` is not found, returns an empty list. The caller should not free or
>>>> + * modify the returned pointer, as it is owned by the cache.
>>>>   */
>>>>  const struct string_list *git_config_get_value_multi(const char *key);
>>>
>>> Aside from the "DWIM API" aspect of this (which I don't mind) I think
>>> this is really taking the low-level function in the wrong direction, and
>>> that we should just add a new simple wrapper instead.
>>>
>>> I.e. both the pre-image API docs & this series gloss over the fact that
>>> we'd not just return NULL here if the config wasn't there, but also if
>>> git_config_parse_key() failed.
>>>
>>> So it seems to me that a better direction would be starting with
>>> something like the WIP below (which doesn't compile the whole code, I
>>> stopped at config.[ch] and pack-bitmap.c). I.e. the same "int" return
>>> and "dest" pattern that most other things in the config API have.
>>
>> Do you have an example where a caller would benefit from this
>> distinction? Without such an example, I don't think it is worth
>> creating such a huge change for purity's sake alone.
> 
> Not initially, I started poking at this because the CL/series/commits
> says that we don't care about the case of non-existing keys, without
> being clear as to why we want to conflate that with other errors we
> might get from this API.
> 
> But after some digging I found:
> 
> 	$ for k in a a.b. "'x.y"; do ./git for-each-repo --config=$k;  echo $?; done
> 	error: key does not contain a section: a
> 	0
> 	error: key does not contain variable name: a.b.
> 	0
> 	error: invalid key: 'x.y
> 	0
> 	
> I.e. the repo_config_get_value_multi() you added in for-each-repo
> doesn't distinguish between bad keys and non-existing keys, and returns
> 0 even though it printed an "error".

I can understand wanting to inform the user that they provided an
invalid key using a nonzero exit code. I can also understand that
the command does what is asked: it did nothing because the given
key has no values (because it can't). I think the use of an "error"
message balances things towards wanting a nonzero exit code.

>> I'm pretty happy that the diff for this series is an overall
>> reduction in code, while also not being too large in the interim:
>>
>>  12 files changed, 39 insertions(+), 57 deletions(-)
>>
>> If all callers that use the *_multi() methods would only use the
>> wrapper, then what is the point of doing the low-level manipulations?
> 
> I hacked up something that's at least RFC-quality based on this
> approach, but CI is running etc., so not submitting it
> now:
> 
> 	https://github.com/git/git/compare/master...avar:git:avar/have-git_configset_get_value-use-dest-and-int-pattern
> 
> I think the resulting diff is more idiomatic API use, i.e. you ended up
> with:
> 
> 	        /* submodule.active is set */
> 	        sl = repo_config_get_value_multi(repo, "submodule.active");
> 	-       if (sl) {
> 	+       if (sl && sl->nr) {

You're right that I forgot to change this one to "if (sl->nr)"
in patch 5.

> But I ended up doing:
> 
> 	        /* submodule.active is set */
> 	-       sl = repo_config_get_value_multi(repo, "submodule.active");
> 	-       if (sl) {
> 	+       if (!repo_config_get_const_value_multi(repo, "submodule.active", &sl)) {
> 
> Note the "const" in the function name, i.e. there's wrappers that handle
> the case where we have a hardcoded key name, in which case we can BUG()
> out if we'd return < 0, so all we have left is just "does key exist".

The problem here is that the block actually cares that the list is non-empty
and should not run if the list is empty. In that case, you would need to add
"&& sl->nr" to the condition.

I'm of course assuming that an empty list is different from an error. In
your for-each-repo example, we would not want to return a non-zero exit
code on an empty list, only on a bad key (or other I/O problem).

If we return a negative value on an error and the number of matches on
success, then this change could instead be "if (repo_config....() > 0)".

> In any case, I'm all for having some simple wrapper for the common cases
A simple wrapper would be nice, and be exactly the method as it is
updated in this series. The error-result version could be adopted when
there is reason to do so.

> But I didn't find a single case where we actually needed this "never
> give me a non-NULL list" behavior, it could just be generalized to
> "let's have the API tell us if the key exist".

Most cases want to feed the result into the for_each_string_list_item()
macro. Based on the changes in patch 5, I think the empty list is a
better pattern and leads to prettier code in almost all cases.

Thanks,
-Stolee