Re: [PATCH v2 1/2] commit: reject invalid UTF-8 codepoints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Peter Krefting <peter@xxxxxxxxxxxxxxxx> writes:

> brian m. carlson:
>
>> +		/* U+FFFE and U+FFFF are guaranteed non-characters. */
>> +		if ((codepoint & 0x1ffffe) == 0xfffe)
>> +			return bad_offset;
>
> I missed this the first time around: All Unicode characters whose
> lower 16-bits are FFFE or FFFF are non-characters, so you can re-write
> that to:
>
>   /* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */
>   if ((codepoint & 0xfffe) == 0xfffe)
>    return bad_offset;
>
> Also, the range U+FDD0--U+FDEF are also non-characters, if you wish to
> be really pedantic.

Yeah, while we are at it, doing this may not hurt.  I think Brian's
two patches are in fairly good shape otherwise, so perhaps you can
do this as a follow-up patch on top of the tip of the topic,
e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]