Re: [PATCH 4/5] config: return an empty list, not NULL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 28 2022, Derrick Stolee wrote:

> On 9/28/2022 10:37 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Sep 28 2022, Derrick Stolee wrote:
>
>>> If we return a negative value on an error and the number of matches on
>>> success, then this change could instead be "if (repo_config....() > 0)".
>> 
>> Hrm, I think you're confusing the worldview your series here is
>> advocating for, and what I'm suggesting as an alternative.
>> 
>> There isn't any way on "master" to have "an empty list", that's a
>> worldview you're proposing. In particular your 1/5 here removes:
>> 
>> 	assert(values->nr > 0);
>
> Yes, I think the lack of a key should be the same to an empty list
> because it is logically equivalent and an empty list is safer to use
> than a possibly-NULL list. That's what started this whole investigation.
>
> By no longer returning NULL, we can eliminate an entire class of bugs
> from happening.

This saga was started because this code had a segfault:

	const struct string_list *list = git_config_get_value_multi(key);
	for_each_string_list_item(item, list) { ... found = 1 ... }

In general I completely agree with you that APIs should be safe by
default, especially when it comes to NULL. E.g. it's *very handy* that
the "buf" member of "struct strbuf" is never NULL.

But I just don't see it in this case, i.e. the alternative once we make
*_multi() act like the rest of the config API is:

	if (!git_config_get_const_value_multi(key, &list))
		for_each_string_list_item(item, list) { ... found = 1 ... }

Or, well, in my version:

	if (!git_config_get_const_value_multi(key, &list))
		found = unsorted_string_list_has_string(list, maintpath);

But that unsorted_string_list_has_string() pattern could also be used in
the former.

I've found this "if I have this data, then" behavior of the config API
to be very handy & safe to use, we just haven't used it for this one API
entry point.

>> More generally the config format has no notion of "an empty list", if
>> you have a valid key-value pair at all you have a list of ".nr >= 1".
>
> The critical part is that you have a return code that has three modes:
>
>  1. Negative: some kind of parsing error happened, such as a malformed
>     'key' parameter.

Critically not "some kind of parsing error", but "malformed key error",
we've already parsed the config itself at this point.

>  2. Zero: The 'key' was found with multiple values.

Well, found at all, maybe it has just one value.

>  3. Positive: The 'key' was not found. (Are there other innocuous
>     errors that fit in this case?)

It's just "not found", nothing else (there's no "return 2;" etc).

But in terms of simple API use if your key is a constant string
(i.e. the ones where we hardcode the key in the C source) we just need
to handle cases #2 and #3, hence why I added that _*const_value_multi()
wrapper.

Of course it's possible to get an error on
git_config_get_const_value_multi("mal.formed.key.", but we'd BUG() out
on that in development, and once we test it once we know the hardcoded
key is valid. So as an API it makes sense not to worry about it, but
just BUG() out.

You only need to handle this "malformed key" case on runtime in
git-config(1) itself, and e.g. "git for-each-repo --config=<key>", where
we get the key from the user

> This "positive means not found" is very non-obvious.

Is it? I find it quite idiomatic in C, but yeah, it would be odd in some
other contexts and/or languages.

In this case though every other function in the config API works that
way, i.e.:

	git grep 'if \(!.*config_get_'

So whatever our desires if it were a clean-room API (where a key
existing might return 1), surely consistency would win out at this
point.

> Even with your goal of exposing those parsing errors when the 'key' is
> user-supplied, I still think it would be better to provide a non-NULL
> empty string_list and only care about these return values:
>
>  1. Negative: some kind of parsing error happened.
>
>  2. Nonnegative: parsing succeeded. The string_list has all found values
>     (might be empty) and the return value is equal to the string_list's
>     'nr' member.

Can you show me how this sort of thing would look in your proposal:

	int bool;
        const struct string_list *list;
        int found = 0;

	if (!git_config_get_const_value_multi("a.list", &list))
		found = unsorted_string_list_has_string(list, "needle");
	if (!git_config_get_bool("a.bool", &bool))
		if (bool) ...;

AFAICT that would be:

	int bool;
        const struct string_list *list = git_config_get_value_multi("a.list");
        int found = unsorted_string_list_has_string(list, "needle");

	if (!git_config_get_bool("a.bool", &bool))
		if (bool) ...;

I find it unintuitive to make it act differently just because it's a
collection, so we can get away with basically stuffing the "existed?" in
the value itself (as .nr == 0).

> In these cases, I see two modes of use.
>
> First, check for an error and exit early (empty list no-ops naturally):
>
> 	if (git_config_get_const_value_multi(key, &list) < 0)
> 		return -1;

No, the _*const_value_multi() never returns < 0, that's a BUG(). So we
don't worry about that for the common case.

At the tip of my branch for this gc, log, versioncmp etc. all use that.

Which means that this right after:

> 	for_each_string_list_item(item, list) {
> 		...
> 	}

Would be an error, we'd have a NULL list if we got a return value of 1,
but the common case is:

	if (!git_config_get_const_value_multi(key, &list) {
		struct string_list_item *item;

		for_each_string_list_item(item, list) ...
	}

> Second, ignore errors. Care about non-empty list:
>
> 	if (git_config_get_const_value_multi("known.key", &list) > 0) {
> 		...
> 	}

Since that's the *_const_value_multi() not > 0, just "!fn()" will do,
since it's 0 or 1.

> But this is just a matter of taste at this point, and I'm getting the
> feeling that my taste for reducing the chances of NULL references is
> not very popular.

It's very popular with me :)

I just don't see how it's needed in *this case*, since we can entirely
sidestep the need for a dangerous NULL value by making it act like the
rest of the API.

 think the reason the bug in [1] happened is exactly because
git_config_get_value_multi(key) currently acts in a way where you need
to explicitly remember to handle the NULL, *and* if you forget to do so
it's easy to have it work.

I.e. in your (initial buggy) version:

	const struct string_list *list = git_config_get_value_multi(key);

	for_each_string_list_item(item, list) { /* boom, "list" can be null */

The problem being that you can easily write that idiomatically, have it
work in development (because you test with that in your config), and
then it blows up if you don't have that config.

But that's exactly why I think the rest of the config API is well
designed, because as soon as you need to write that as:

	const struct string_list *list;

	if (!git_config_get_const_value_multi(key, &list))
		for_each_string_list_item(item, list) { /* "list" can't be null /*

You're forced to structure your code in such a way that you can't really
forget the implicit NULL check, because to make use of the API at all
you need an "if I have the data, then..." block.

1. https://lore.kernel.org/git/8a8bffaec89e55da0c5bcac2f04331e0d4e69790.1664218087.git.gitgitgadget@xxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux