Re: [PATCH v4 5/6] convert: add 'working-tree-encoding' attribute

Lars Schneider <larsxschneider@xxxxxxxxx> · Mon, 22 Jan 2018 13:35:25 +0100

> On 21 Jan 2018, at 15:22, Simon Ruderich <simon@xxxxxxxxxxxx> wrote:
> 
> On Sat, Jan 20, 2018 at 04:24:17PM +0100, lars.schneider@xxxxxxxxxxxx wrote:
>> +static struct encoding *git_path_check_encoding(struct attr_check_item *check)
>> +{
>> +	const char *value = check->value;
>> +	struct encoding *enc;
>> +
>> +	if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value) ||
>> +	    !strlen(value))
>> +		return NULL;
>> +
>> +	for (enc = encoding; enc; enc = enc->next)
>> +		if (!strcasecmp(value, enc->name))
>> +			return enc;
>> +
>> +	/* Don't encode to the default encoding */
>> +	if (!strcasecmp(value, default_encoding))
>> +		return NULL;
>> +
>> +	enc = xcalloc(1, sizeof(struct convert_driver));
> 
> I think this should be "sizeof(struct encoding)" but I prefer
> "sizeof(*enc)" which prevents these kind of mistakes.

Great catch! Thank you!

Other code paths are at risk of this problem too. Consider this:

$ git grep 'sizeof(\*' | wc -l
     303
$ git grep 'sizeof(struct ' | wc -l
     208

E.g. even in the same file (likely where I got the code from):
https://github.com/git/git/blob/59c276cf4da0705064c32c9dba54baefa282ea55/convert.c#L780

@Junio: 
Would you welcome a patch that replaces "struct foo" with "*foo"
if applicable?

>> +	enc->name = xstrdup_toupper(value);  /* aways use upper case names! */
> 
> "aways" -> "always" and I think the comment should say why
> uppercase is important.

Would that be better?

	/* Aways use upper case names to simplify subsequent string comparison. */
	enc->name = xstrdup_toupper(value);

AFAIK uppercase and lowercase names are both valid. I just wanted to
ensure that we use one consistent casing. That reads better in error messages
and I don't need to check for the letter case in has_prohibited_utf_bom()
and friends in utf8.c

>> +test_expect_success 'ensure UTF-8 is stored in Git' '
>> +	git cat-file -p :test.utf16 >test.utf16.git &&
>> +	test_cmp_bin test.utf8.raw test.utf16.git &&
>> +	rm test.utf8.raw test.utf16.git
>> +'
>> +
>> +test_expect_success 're-encode to UTF-16 on checkout' '
>> +	rm test.utf16 &&
>> +	git checkout test.utf16 &&
>> +	test_cmp_bin test.utf16.raw test.utf16 &&
>> +
>> +	# cleanup
>> +	rm test.utf16.raw
> 
> Micro-nit: For consistency with the previous test, remove the
> empty line and comment (or just keep the files generated from the
> "setup test repo" phase and don't explicitly delete them)?

I would rather add a new line and a comment to the previous test 
to be consistent.

I know we could leave the files but these lingering files could
always surprise writers of future tests (at least they surprised
me in other tests).

Thank you very much for the review,
Lars