On Tue, Mar 5, 2024 at 5:23 PM Jeff King <peff@xxxxxxxx> wrote: > And I think that may be the case here. The "kvi" struct itself is local > to do_config_from(). But when we load the caching configset, we do so in > configset_add_value() which makes a shallow copy of the kvi struct. And > then that shallow copy may live on and be accessed with further calls to > git_config(). I missed the shallow copy of kvi in do_config_from. Unfortunately, we also pass this back into a callback function, and although it is often not used, changing the code to be more precise about string ownership here seems non-trivial. > > though probably an actual kvi_copy() function would be less horrible. > > A few other things to note, looking at this code: > > - isn't kvi->path in the same boat? We do not duplicate it at all, so > it seems like the shallow copy made in the configset could cause a > user-after-free. The situation with path seems to indicate that indeed ownership is correctly handled up the call-stack (i.e. the config_source owns the file/path strings), and these strings are valid for the duration of the shallow copies through the call-graph. But, the fact that filename in particular is interned may indicate that the behavior of leaking the string to static lifetime is used by a caller at some point. > But it possibly could make sense to have the configset own a single > duplicate string, and then let the kvi structs it holds point to > that string. But IMHO all of this should be details of the configset > code, and the main config-iteration code should not have to worry > about this at all. I.e., I think kvi_from_source() should not be > duplicating anything in the first place. > > -Peff Based on this discussion, I think the cleanest solution is a lock on the strintern hashmap. I originally wanted to avoid adding locking, since most call-paths are single threaded, and I thought we were only using the string locally. However, copying the string is more expensive and potentially intrusive to many call-sites, and it is much more difficult to reason about the correctness of the change, so I think adding a lock is preferable. I'll switch to that in v2 along with the more descriptive commit message.