Re: [PATCH 1/3] utf8-bom: introduce skip_utf8_bom() helper

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 16 Apr 2015 11:23:31 -0700

Jeff King <peff@xxxxxxxx> writes:

> On Thu, Apr 16, 2015 at 10:52:52AM -0700, Junio C Hamano wrote:
>
>> @@ -576,10 +576,8 @@ int add_excludes_from_file_to_list(const char *fname,
>>  
>>  	el->filebuf = buf;
>>  
>> -	if (size >= 3 && !memcmp(buf, utf8_bom, 3))
>> -		entry = buf + 3;
>> -	else
>> -		entry = buf;
>> +	entry = buf;
>> +	skip_utf8_bom(&entry, size);
>>  
>>  	for (i = 0; i < size; i++) {
>>  		if (buf[i] == '\n') {
>
> I'm surprised that in both yours and the original that we do not need to
> subtract 3 from "size".

Or we start scanning from the beginning of "buf", i.e.

	for (i = 0; i < size; i++)

After you pointed it out, I wondered why we do not adjust the
initial value of "i" (without futzing with "size").  But...

> It looks like we advance "entry" here, not "buf", and then iterate over
> "buf". But I think that makes the later logic weird:
>
>    if (entry != buf + i && entry[0] != '#')
>
> because if there is a BOM, we end up with "entry > buf + i", which I
> think this code isn't expecting. I'm not sure it does anything bad, but
> I think it might be simpler as just:
>
>   /* save away the "real" copy for later, as we do now */
>   el->filebuf = buf;
>
>   /*
>    * now pretend as if the BOM was not there at all by advancing
>    * the pointer and shrinking the size
>    */
>   skip_utf8_bom(&buf, &size);
>
>   /*
>    * and now we do our usual magic with "entry"
>    */
>   entry = buf;
>   for (i = 0; i < size; i++)
>      ...

... this would work much better for this caller.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html