Re: [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 09 Mar 2018 11:10:24 -0800

lars.schneider@xxxxxxxxxxxx writes:

> +static const char *default_encoding = "UTF-8";
> +
> ...
> +static const char *git_path_check_encoding(struct attr_check_item *check)
> +{
> +	const char *value = check->value;
> +
> +	if (ATTR_UNSET(value) || !strlen(value))
> +		return NULL;
> +
> +	if (ATTR_TRUE(value) || ATTR_FALSE(value)) {
> +		error(_("working-tree-encoding attribute requires a value"));
> +		return NULL;
> +	}

Hmph, so we decide to be loud but otherwise ignore an undefined
configuration?  Shouldn't we rather die instead to avoid touching
the user data in unexpected ways?

> +
> +	/* Don't encode to the default encoding */
> +	if (!strcasecmp(value, default_encoding))
> +		return NULL;

Is this an optimization to avoid "recode one encoding to the same
encoding" no-op overhead?  We already have the optimization in the
same spirit in may existing codepaths that has nothing to do with
w-t-e, and I think we should share the code.  Two pieces of thought
comes to mind.

One is a lot smaller in scale: Is same_encoding() sufficient for
this callsite instead of strcasecmp()?

The other one is a lot bigger: Looking at all the existing callers
of same_encoding() that call reencode_string() when it returns false,
would it make sense to drop same_encoding() and move the optimization
to reencode_string() instead?

I suspect that the answer to the smaller one is "yes, and even if
not, it should be easy to enhance/extend same_encoding() to make it
do what we want it to, and such a change will benefit even existing
callers."  The answer to the larger one is likely "the optimization
is not about skipping only reencode_string() call but other things
are subtly different among callers of same_encoding(), so such a
refactoring would not be all that useful."

The above still holds for the code after 10/10 touches this part.