On February 16, 2016 7:48:56 AM PST, tip-bot for Jason Andryuk <tipbot@xxxxxxxxx> wrote: >Commit-ID: a68075908a37850918ad96b056acc9ac4ce1bd90 >Gitweb: >http://git.kernel.org/tip/a68075908a37850918ad96b056acc9ac4ce1bd90 >Author: Jason Andryuk <jandryuk@xxxxxxxxx> >AuthorDate: Fri, 12 Feb 2016 23:13:33 +0000 >Committer: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> >CommitDate: Tue, 16 Feb 2016 12:49:05 +0000 > >lib/ucs2_string: Correct ucs2 -> utf8 conversion > >The comparisons should be >= since 0x800 and 0x80 require an additional >bit >to store. > >For the 3 byte case, the existing shift would drop off 2 more bits than >intended. > >For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits >in >byte 2. > >Signed-off-by: Jason Andryuk <jandryuk@xxxxxxxxx> >Reviewed-by: Laszlo Ersek <lersek@xxxxxxxxxx> >Cc: Peter Jones <pjones@xxxxxxxxxx> >Cc: Matthew Garrett <mjg59@xxxxxxxxxx> >Cc: "Lee, Chun-Yi" <jlee@xxxxxxxx> >Signed-off-by: Matt Fleming <matt@xxxxxxxxxxxxxxxxxxx> >--- > lib/ucs2_string.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > >diff --git a/lib/ucs2_string.c b/lib/ucs2_string.c >index 17dd74e..f0b323a 100644 >--- a/lib/ucs2_string.c >+++ b/lib/ucs2_string.c >@@ -59,9 +59,9 @@ ucs2_utf8size(const ucs2_char_t *src) > for (i = 0; i < ucs2_strlen(src); i++) { > u16 c = src[i]; > >- if (c > 0x800) >+ if (c >= 0x800) > j += 3; >- else if (c > 0x80) >+ else if (c >= 0x80) > j += 2; > else > j += 1; >@@ -88,19 +88,19 @@ ucs2_as_utf8(u8 *dest, const ucs2_char_t *src, >unsigned long maxlength) > for (i = 0; maxlength && i < limit; i++) { > u16 c = src[i]; > >- if (c > 0x800) { >+ if (c >= 0x800) { > if (maxlength < 3) > break; > maxlength -= 3; > dest[j++] = 0xe0 | (c & 0xf000) >> 12; >- dest[j++] = 0x80 | (c & 0x0fc0) >> 8; >+ dest[j++] = 0x80 | (c & 0x0fc0) >> 6; > dest[j++] = 0x80 | (c & 0x003f); >- } else if (c > 0x80) { >+ } else if (c >= 0x80) { > if (maxlength < 2) > break; > maxlength -= 2; >- dest[j++] = 0xc0 | (c & 0xfe0) >> 5; >- dest[j++] = 0x80 | (c & 0x01f); >+ dest[j++] = 0xc0 | (c & 0x7c0) >> 6; >+ dest[j++] = 0x80 | (c & 0x03f); > } else { > maxlength -= 1; > dest[j++] = c & 0x7f; I also believe there is no such thing as a "ucs2 string". This code will procedure invalid utf8 if utf16 surrogates are present; this is how the abortion called cesu8 ended up happening. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |