Re: [PATCH 0/6] [RFC] config.c: use struct for config reading state

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Mon, 6 Mar 2023 11:57:56 -0800

"Glen Choo via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>  * We could add "struct config_reader" to "config_fn_t", i.e.
>    
>    -typedef int (*config_fn_t)(const char *var, const char *val, void
>    *data); +typedef int (*config_fn_t)(const struct config_reader *reader,
>    const char *var, const char *val, void *data);
>    
>    which isn't complex at all, except that there are ~100 config_fn_t
>    implementations [3] and a good number of them may never reference
>    "reader". If the churn is tolerable, I think this a good way forward.
> 
>  * We could create a new kind of "config_fn_t" that accepts "struct
>    config_reader", e.g.
>    
>    typedef int (*config_fn_t)(const char *var, const char *val, void *data);
>    +typedef int (*config_state_fn_t)(const struct config_reader *reader,
>    const char *var, const char *val, void *data);
>    
>    and only adjust the callers that would actually reference "reader". This
>    is less churn, but I couldn't find a great way to do this kind of
>    'switching between config callback types' elegantly.

To reduce churn, one thing that could be done alongside is to convert
config-using code (which is...practically the rest of Git) to start
using the configset interface (we seem to be using configsets internally
anyway, as evidenced by repo_config()). That way, we would reduce the
number of implementations of config_fn_t.

>  * We could smuggle "struct config_reader" to callback functions in a way
>    that interested callers could see it, but uninterested callers could
>    ignore. One trick that Jonathan Tan came up with (though not necessarily
>    endorsed) would be to allocate a struct for the config value + "struct
>    config_reader", then, interested callers could use "offset_of" to recover
>    the "struct config_reader". It's a little hacky, but it's low-churn.

Indeed, although we should probably use this as a last resort.

> = Questions
> 
>  * Is this worth merging without the extra work? There are some cleanups in
>    this series that could make it valuable, but there are also some hacks
>    (see 4/6) that aren't so great.

I'm leaning towards merging it now, but can go either way, since the
cost of churn is limited to one single file, but so are the benefits.
If it was up to me to decide, I would merge it now, because this opens
up a lot of work that other contributors could individually do (in
particular, converting individual config code paths so that we don't
need to reference the_reader as a global anymore).

I don't see 4/6 as a hack. It is true that the nature of the config_fn_t
callback could change so that passing the reader would end up being
done in yet another different way, but firstly, I don't think that will
happen for quite some time, and secondly, it might not happen at all
(right now, I think what's most likely to happen is that the rest of the
Git code moves to configsets and only a fraction of the Git code would
need to do low-level parsing, which would not have a problem passing the
reader through the data object since they would probably need to pass
other context anyway).

>  * Is the extra work even worth it?

Depends on which extra work, but I think that eliminating the the_reader
global would really be useful (and, as far as I know, the whole reason
for this effort). From the Git codebase's perspective, doing this would
(as far as I know) eliminate the need for pushing and popping cf, and
make multithreaded multi-repo operations less error-prone (e.g. we
can spawn a thread operating on a submodule and that thread can itself
read the configs of nested submodules without worrying about clobbering
global state...well, there is thread-local storage, but as far as I know
this is not portable).

>  * Do any of the ideas seem more promising than the others? Are there other
>    ideas I'm missing?

Hopefully I answered this in my answers to the other questions.