Re: [PATCH 0/5] [RFC] config API: return empty list, not NULL

Derrick Stolee <derrickstolee@xxxxxxxxxx> · Wed, 28 Sep 2022 14:38:21 -0400

On 9/27/2022 10:40 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> This work changes the behavior of asking for a multi-valued config key to
>> return an empty list instead of a NULL value. This simplifies the handling
>> of the result and is safer for development in the future.
>>
>> This is based on v4 of my unregister series [1]
>>
>> [1]
>> https://lore.kernel.org/git/pull.1358.v4.git.1664287021.gitgitgadget@xxxxxxxxx/
>>
>> This idea came about due to a bug in the git maintenance unregister work
>> where the result from git_config_get_value_multi() was sent directly to
>> for_each_string_list_item() without checking for a NULL value first.
>>
>> I'm sending this as an RFC mostly because I'm not 100% sure this shift is
>> worth the refactoring pain and effort. I personally think getting an empty
>> list is a safer choice, but I can also understand if someone has a different
>> opinion.
> 
> Thanks.
> 
> I actually am in favor of the idea that a NULL can be passed around
> to signal the lack of a string_list (or the lack of a instance of
> any "collection" type), and the current code is structured as such,
> and it gives us extra flexibility.  Of course, we need to see if
> that extra flexibility is worth it.
> 
> With a colleciton col, "if (col && col->nr)" checks if we have
> something to work on.  But a code like this (which is a longhand for
> the for_each_string_list_item() issue we just reencountered):
> 
>     col = git_get_some_collection(...);
>     if (!col)
> 	return; /* no collection */
>     if (!col->nr)
> 	git_add_to_some_collection(col, the default item);
>     for (i = 0; i < col->nr; i++)
> 	do things on col.stuff[i];
> 
> can react differently to cases where we have an empty collection
> and where we do not have any collection to begin with.  
> 
> The other side of the coin is that it would make it harder to treat
> the lack of collection itself and the collection being empty the
> same way.  The above code might need to become
> 
>     col = git_get_some_collection(...);
>     if (!col)
> 	col = git_get_empty_collection();
>     if (!col->nr)
> 	git_add_to_some_collection(col, the default item);
>     for (i = 0; i < col->nr; i++)
> 	do things on col.stuff[i];
> 
> but if the "get the collection" thing returns an empty collection
> when there is actually no collection, we can lose two lines from
> there.

I'm all for conveying more information when possible, but how can
the config API provide a distinction between an empty list and a
NULL list? The only thing I can think about is a case where the
empty value clears the list and no new values are added, such as

	[bogus "key"]
		item = one
		item = two
		item =

With this, the key exists in the config file, but the multi-valued
list is empty. Is that an important distinction? I don't think so.

> Between these two positions, I am in favor of the flexibility that
> we can use NULL to signal the lack of collection, not a presence of
> a collection with zero items, as it feels conceptually cleaner.
> 
> Counting the hunks in [2/5] and [5/5], it seems that "when no
> variable with given key is defined, we return an empty set" makes us
> to have more code in 7 places in [PATCH 2/5], while allowing us to
> lose code in 10 places in [PATCH 5/5].

Outside of the "if (sl && sl->nr) {" that I forgot to convert into
"if (sl->nr) {" in patch 5, all of those 7 places that have "more code"
end up with only that extra "->nr". The code looks uglier temporarily
in patch 2 to create the compatibility mode so patch 4 can change only
the config API without test failures.

Thanks,
-Stolee