Re: [RFC PATCH 2/3] Make ce_compare_gitlink thread-safe

Atneya Nair <atneya@xxxxxxxxxx> · Tue, 12 Mar 2024 15:13:16 -0700

On Tue, Mar 5, 2024 at 5:23 PM Jeff King <peff@xxxxxxxx> wrote:

> And I think that may be the case here. The "kvi" struct itself is local
> to do_config_from(). But when we load the caching configset, we do so in
> configset_add_value() which makes a shallow copy of the kvi struct. And
> then that shallow copy may live on and be accessed with further calls to
> git_config().

I missed the shallow copy of kvi in do_config_from. Unfortunately, we also
pass this back into a callback function, and although it is often not used,
changing the code to be more precise about string ownership here seems
non-trivial.

>
> though probably an actual kvi_copy() function would be less horrible.
>
> A few other things to note, looking at this code:
>
>   - isn't kvi->path in the same boat? We do not duplicate it at all, so
>     it seems like the shallow copy made in the configset could cause a
>     user-after-free.

The situation with path seems to indicate that indeed ownership is correctly
handled up the call-stack (i.e. the config_source owns the file/path strings),
and these strings are valid for the duration of the shallow copies through the
call-graph. But, the fact that filename in particular is interned may indicate
that the behavior of leaking the string to static lifetime is used by
a caller at
some point.

>     But it possibly could make sense to have the configset own a single
>     duplicate string, and then let the kvi structs it holds point to
>     that string. But IMHO all of this should be details of the configset
>     code, and the main config-iteration code should not have to worry
>     about this at all. I.e., I think kvi_from_source() should not be
>     duplicating anything in the first place.
>
> -Peff

Based on this discussion, I think the cleanest solution is a lock on
the strintern hashmap. I originally wanted to avoid adding locking, since most
call-paths are single threaded, and I thought we were only using the
string locally.
However, copying the string is more expensive and potentially intrusive to many
call-sites, and it is much more difficult to reason about the correctness of the
change, so I think adding a lock is preferable. I'll switch to that in
v2 along with the
more descriptive commit message.