Peter Krefting <peter@xxxxxxxxxxxxxxxx> writes: > brian m. carlson: > >> + /* U+FFFE and U+FFFF are guaranteed non-characters. */ >> + if ((codepoint & 0x1ffffe) == 0xfffe) >> + return bad_offset; > > I missed this the first time around: All Unicode characters whose > lower 16-bits are FFFE or FFFF are non-characters, so you can re-write > that to: > > /* U+xxFFFE and U+xxFFFF are guaranteed non-characters. */ > if ((codepoint & 0xfffe) == 0xfffe) > return bad_offset; > > Also, the range U+FDD0--U+FDEF are also non-characters, if you wish to > be really pedantic. Yeah, while we are at it, doing this may not hurt. I think Brian's two patches are in fairly good shape otherwise, so perhaps you can do this as a follow-up patch on top of the tip of the topic, e82bd6cc (commit: reject overlong UTF-8 sequences, 2013-07-04)? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html