Re: [PATCH 2/3] config: fix case sensitive subsection names on writing

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 01 Aug 2018 15:51:16 -0700

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> A use reported a submodule issue regarding strange case indentation
> issues, but it could be boiled down to the following test case:

Perhaps

s/use/user/
s/case indentation issues/section mix-up/

> ... However we do not have a test for writing out config correctly with
> case sensitive subsection names, which is why this went unnoticed in
> 6ae996f2acf (git_config_set: make use of the config parser's event
> stream, 2018-04-09)

s/unnoticed in \(.*04-09)\)/unnoticed when \1 broke it./

This is why I asked if the patch is a "FIX" for an issue introduced
by the cited commit.

> Unfortunately we have to make a distinction between old style configuration
> that looks like
>
>   [foo.Bar]
>         key = test
>
> and the new quoted style as seen above. The old style is documented as
> case-agnostic, hence we need to keep 'strncasecmp'; although the
> resulting setting for the old style config differs from the configuration.
> That will be fixed in a follow up patch.
>
> Reported-by: JP Sugarbroad <jpsugar@xxxxxxxxxx>
> Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx>
> ---
>  config.c          | 12 +++++++++++-
>  t/t1300-config.sh |  1 +
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/config.c b/config.c
> index 7968ef7566a..1d3bf9b5fc0 100644
> --- a/config.c
> +++ b/config.c
> @@ -37,6 +37,7 @@ struct config_source {
>  	int eof;
>  	struct strbuf value;
>  	struct strbuf var;
> +	unsigned section_name_old_dot_style : 1;
>  
>  	int (*do_fgetc)(struct config_source *c);
>  	int (*do_ungetc)(int c, struct config_source *conf);
> @@ -605,6 +606,7 @@ static int get_value(config_fn_t fn, void *data, struct strbuf *name)
>  
>  static int get_extended_base_var(struct strbuf *name, int c)
>  {
> +	cf->section_name_old_dot_style = 0;
>  	do {
>  		if (c == '\n')
>  			goto error_incomplete_line;
> @@ -641,6 +643,7 @@ static int get_extended_base_var(struct strbuf *name, int c)
>  
>  static int get_base_var(struct strbuf *name)
>  {
> +	cf->section_name_old_dot_style = 1;
>  	for (;;) {
>  		int c = get_next_char();
>  		if (cf->eof)

OK, let me rephrase.  The basic parse structure is that

 * upon seeing '[', we call get_base_var(), which stuffs the
   "section" (including subsection, if exists) in the strbuf.

 * get_base_var() upon seeing a space after "[section ", calls
   get_extended_base_var().  This space can never exist in an
   old-style three-level names, where it is spelled as
   "[section.subsection]".  This space cannot exist in two-level
   names, either.  The closing ']' is eaten by this function before
   it returns.

 * get_extended_base_var() grabs the "double quoted" subsection name
   and eats the closing ']' before it returns.

So you set the new bit (section_name_old_dot_style) at the beginning
of get_base_var(), i.e. declare that you assume we are reading old
style, but upon entering get_extended_base_var(), unset it, because
now you know we are parsing a modern style three-level name(s).

Feels quite sensible way to keep track of old/new styles.

When parsing two-level names, old-style bit is set, which we may
need to be careful, thoguh.

> @@ -2355,14 +2358,21 @@ static int store_aux_event(enum config_event_t type,
>  	store->parsed[store->parsed_nr].type = type;
>  
>  	if (type == CONFIG_EVENT_SECTION) {
> +		int (*cmpfn)(const char *, const char *, size_t);
> +
>  		if (cf->var.len < 2 || cf->var.buf[cf->var.len - 1] != '.')
>  			return error("invalid section name '%s'", cf->var.buf);
>  
> +		if (cf->section_name_old_dot_style)
> +			cmpfn = strncasecmp;
> +		else
> +			cmpfn = strncmp;
> +
>  		/* Is this the section we were looking for? */
>  		store->is_keys_section =
>  			store->parsed[store->parsed_nr].is_keys_section =
>  			cf->var.len - 1 == store->baselen &&
> -			!strncasecmp(cf->var.buf, store->key, store->baselen);
> +			!cmpfn(cf->var.buf, store->key, store->baselen);

OK.  Section names should still be case insensitive (only the case
sensitivity of subsection names is special), but presumably that's
already normalized by the caller so we do not have to worry when we
use strncmp()?  Can we add a test to demonstrate that it works
correctly?

Thanks.