Re: [PATCH v9 6/8] convert: check for detectable errors in UTF encodings

Lars Schneider <larsxschneider@xxxxxxxxx> · Tue, 6 Mar 2018 18:03:29 +0100

> On 06 Mar 2018, at 02:23, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
> 
>>> On 05 Mar 2018, at 22:50, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>> 
>>> lars.schneider@xxxxxxxxxxxx writes:
>>> 
>>>> +static int validate_encoding(const char *path, const char *enc,
>>>> +		      const char *data, size_t len, int die_on_error)
>>>> +{
>>>> +	if (!memcmp("UTF-", enc, 4)) {
>>> 
>>> Does the caller already know that enc is sufficiently long that
>>> using memcmp is safe?
>> 
>> No :-(
>> 
>> Would you be willing to squash that in?
>> 
>>    if (strlen(enc) > 4 && !memcmp("UTF-", enc, 4)) {
>> 
>> I deliberately used "> 4" as plain "UTF-" is not even valid.
> 
> I'd rather not.  The code does not have to even look at 6th and
> later bytes in the enc[] even if it wanted to reject "UTF-" followed
> by nothing, but use of strlen() forces it to look at everything.
> 
> Stepping back, shouldn't
> 
> 	if (starts_with(enc, "UTF-") 
> 
> be sufficient?  If you really care about the case where "UTF-" alone
> comes here, you could write
> 
> 	if (starts_with(enc, "UTF-") && enc[4])
> 
> but I do not think "&& enc[4]" is even needed.  The functions called
> from this block would not consider "UTF-" alone as something valid
> anyway, so with that "&& enf[4]" we would be piling more code only
> for invalid/rare case.

Agreed, "if (starts_with(enc, "UTF-"))" is sufficient. Can you squash
that in? Thanks for pointing me to starts_with() as I forgot about this
function!

- Lars