Re: [PATCH v3] Sanity-check config variable names

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 27 Jan 2011 14:45:16 -0800

Libor Pechacek <lpechacek@xxxxxxx> writes:

> Sanity-check config variable names when adding and retrieving them.  As a side
> effect code duplication between git_config_set_multivar and get_value (in
> builtin/config.c) was removed and the common functionality was placed in
> git_config_parse_key.
>
> This breaks a test in t1300 which used invalid section-less keys in the tests
> for "git -c". However, allowing such names there was useless, since there was
> no way to set them via config file, and no part of git actually tried to use
> section-less keys. This patch updates the test to use more realistic examples.
>
> Signed-off-by: Libor Pechacek <lpechacek@xxxxxxx>
> Cc: Jeff King <peff@xxxxxxxx>
> ---
>
> On Fri 21-01-11 11:23:19, Jeff King wrote:
>> Typo: s/ckeck/check/
>> 
>> Other than that, this version looks good to me.
>
> Fixed the typo and return values from get_value and git_config_set_multivar.
> We have changed git_config_parse_key to return negative values on error, but
> forgot to negate the numbers when returning them as exit codes.

Earlier get_value() returned:

 -1: when it saw an uncompilable regexp (either in key or value);
  0: when a value was available (under --all) or unique (without --all);
  1: when the requested variable is missing; and
  2: when the requested variable is multivalued under --all.

As the new error condition you are adding is to detect and signal a broken
input, doesn't it fall into the same category as "broken regexp", which in
turn would mean that it should return the same -1?

Or to distinguish between "invalid character in the key" and "no section
name in the key", you might want to add -2 to the mix, but personally I
don't think it is worth it.  The advantage of being able to tell between
the two kinds of breakage feels much smaller than the cost of having to
worry about breaking existing callers that try to catch broken user input
by checking the return value from get_value() explicitly against -1, not
just any negative value.

Note that I am not suggesting to change the return value from
config-parse-key in the above; I am only talking about get_value().

>  	if (use_key_regexp) {
> +		char *tl;
> +
> +		/* TODO - this naive pattern lowercasing obviously does not
> +		 * work for more complex patterns like `^[^.]*Foo.*bar' */
> +		for (tl = key+strlen(key)-1; tl >= key && *tl != '.'; --tl)
> +			*tl = tolower(*tl);
> +		for (tl = key; *tl && *tl != '.'; ++tl)
> +			*tl = tolower(*tl);

When moving an existing code segment around like this, I would not mind to
see style fixes thrown in to the patch, as long as the moved code is small
enough.  Perhaps like this:

	/*
         * NEEDSWORK: this naive pattern lowercasing obviously does
	 * not work for more complex patterns like "^[^.]*Foo.*bar".
	 * Perhaps we should deprecate this altogether someday.
         */
	for (tl = key + strlen(key) - 1;
	     tl >= key && *tl != '.';
	     tl--)
		*tl = tolower(*tl);
	for (tl = key; *tl && *tl != '.'; tl++)
		*tl = tolower(*tl);

The style rules applied above are...

  - We want to see SP around binary operators;

  - However, we don't want to see a line that is too long;

  - When there is no reason to choose between (pre/post)-(in/de)crement,
    post-(in/de)crement is easier to read.

> diff --git a/config.c b/config.c
> index 625e051..c976544 100644
> --- a/config.c
> +++ b/config.c
> @@ -1098,6 +1098,72 @@ int git_config_set(const char *key, const char *value)
>  	return git_config_set_multivar(key, value, NULL, 0);
>  }
>  
> +/* Auxiliary function to sanity-check and split the key into the section

	/*
         * Style. We write our multi-line comments
	 * like this.
         */

> +int git_config_parse_key(const char *key, char **store_key, int *baselen_)
> +{
> +	int i, dot, baselen;
> +	const char *last_dot = strrchr(key, '.');
> +
> +	/*
> +	 * Since "key" actually contains the section name and the real
> +	 * key name separated by a dot, we have to know where the dot is.
> +	 */
> +
> +	if (last_dot == NULL) {
> +		error("key does not contain a section: %s", key);
> +		return -2;
> +	}
> +
> +	baselen = last_dot - key;
> +	if (baselen_)
> +		*baselen_ = baselen;
> +
> +	/*
> +	 * Validate the key and while at it, lower case it for matching.
> +	 */
> +	if (store_key)
> +		*store_key = xmalloc(strlen(key) + 1);

Does it make sense for this function to be prepared to get called with
NULL in store_key like this (and in the remainder of your patch)?

The primary reason somebody calls this function is so that the caller can
then use the resulting lowercased form for matching, but "if store-key is
not NULL" feels as if this code is optimized for a special case caller
that only wants to validate without using the resulting lowercased key for
matching, which does not exist yet.

> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index d0e5546..3e79c37 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -876,11 +876,10 @@ test_expect_success 'check split_cmdline return' "
>  	"
>  
>  test_expect_success 'git -c "key=value" support' '
> -	test "z$(git -c name=value config name)" = zvalue &&
>  	test "z$(git -c core.name=value config core.name)" = zvalue &&
> -	test "z$(git -c CamelCase=value config camelcase)" = zvalue &&
> -	test "z$(git -c flag config --bool flag)" = ztrue &&
> -	test_must_fail git -c core.name=value config name
> +	test "z$(git -c foo.CamelCase=value config foo.camelcase)" = zvalue &&
> +	test "z$(git -c foo.flag config --bool foo.flag)" = ztrue &&
> +	test_must_fail git -c name=value config core.name
>  '

Don't you want to make sure that your sanity check triggers in new tests?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html