On 02/17/16 18:15, H. Peter Anvin wrote: > On February 16, 2016 7:48:56 AM PST, tip-bot for Jason Andryuk <tipbot@xxxxxxxxx> wrote: >> Commit-ID: a68075908a37850918ad96b056acc9ac4ce1bd90 >> Gitweb: >> http://git.kernel.org/tip/a68075908a37850918ad96b056acc9ac4ce1bd90 >> Author: Jason Andryuk <jandryuk@xxxxxxxxx> >> AuthorDate: Fri, 12 Feb 2016 23:13:33 +0000 >> Committer: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> >> CommitDate: Tue, 16 Feb 2016 12:49:05 +0000 >> >> lib/ucs2_string: Correct ucs2 -> utf8 conversion >> >> The comparisons should be >= since 0x800 and 0x80 require an additional >> bit >> to store. >> >> For the 3 byte case, the existing shift would drop off 2 more bits than >> intended. >> >> For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits >> in >> byte 2. >> >> Signed-off-by: Jason Andryuk <jandryuk@xxxxxxxxx> >> Reviewed-by: Laszlo Ersek <lersek@xxxxxxxxxx> >> Cc: Peter Jones <pjones@xxxxxxxxxx> >> Cc: Matthew Garrett <mjg59@xxxxxxxxxx> >> Cc: "Lee, Chun-Yi" <jlee@xxxxxxxx> >> Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> >> --- >> lib/ucs2_string.c | 14 +++++++------- >> 1 file changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/lib/ucs2_string.c b/lib/ucs2_string.c >> index 17dd74e..f0b323a 100644 >> --- a/lib/ucs2_string.c >> +++ b/lib/ucs2_string.c >> @@ -59,9 +59,9 @@ ucs2_utf8size(const ucs2_char_t *src) >> for (i = 0; i < ucs2_strlen(src); i++) { >> u16 c = src[i]; >> >> - if (c > 0x800) >> + if (c >= 0x800) >> j += 3; >> - else if (c > 0x80) >> + else if (c >= 0x80) >> j += 2; >> else >> j += 1; >> @@ -88,19 +88,19 @@ ucs2_as_utf8(u8 *dest, const ucs2_char_t *src, >> unsigned long maxlength) >> for (i = 0; maxlength && i < limit; i++) { >> u16 c = src[i]; >> >> - if (c > 0x800) { >> + if (c >= 0x800) { >> if (maxlength < 3) >> break; >> maxlength -= 3; >> dest[j++] = 0xe0 | (c & 0xf000) >> 12; >> - dest[j++] = 0x80 | (c & 0x0fc0) >> 8; >> + dest[j++] = 0x80 | (c & 0x0fc0) >> 6; >> dest[j++] = 0x80 | (c & 0x003f); >> - } else if (c > 0x80) { >> + } else if (c >= 0x80) { >> if (maxlength < 2) >> break; >> maxlength -= 2; >> - dest[j++] = 0xc0 | (c & 0xfe0) >> 5; >> - dest[j++] = 0x80 | (c & 0x01f); >> + dest[j++] = 0xc0 | (c & 0x7c0) >> 6; >> + dest[j++] = 0x80 | (c & 0x03f); >> } else { >> maxlength -= 1; >> dest[j++] = c & 0x7f; > > I also believe there is no such thing as a "ucs2 string". This code will procedure invalid utf8 if utf16 surrogates are present; this is how the abortion called cesu8 ended up happening. I raised the same concern; please see the sub-thread at: http://thread.gmane.org/gmane.linux.kernel.efi/7366/focus=7493 If I understand correctly, the decision was that the caller would be responsible for not passing in surrogates. Thanks Laszlo -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |