Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 08 Jul 2013 12:36:07 -0700

Peter Krefting <peter@xxxxxxxxxxxxxxxx> writes:

> brian m. carlson:
>
>> +		/* U+FFFE and U+FFFF are guaranteed non-characters. */
>> +		if ((codepoint & 0x1ffffe) == 0xfffe)
>> +			return bad_offset;
>
> I missed this the first time around: All Unicode characters whose
> lower 16-bits are FFFE or FFFF are non-characters, so you can re-write
> that to:
>
>   /* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
>   if ((codepoint & 0xfffe) == 0xfffe)
>    return bad_offset;
>
> Also, the range U+FDD0--U+FDEF are also non-characters, if you wish to
> be really pedantic.

Yeah, while we are at it, doing this may not hurt.  I think Brian's
two patches are in fairly good shape otherwise, so perhaps you can
do this as a follow-up patch on top of the tip of the topic,
e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html