On Tue, May 23, 2023 at 6:53 AM Olivier Gayot <olivier.gayot@xxxxxxxxxxxxx> wrote: > > The utf16_le_to_7bit function claims to, naively, convert a UTF-16 > string to a 7-bit ASCII string. By naively, we mean that it: > * drops the first byte of every character in the original UTF-16 string > * checks if all characters are printable, and otherwise replaces them > by exclamation mark "!". > > This means that theoretically, all characters outside the 7-bit ASCII > range should be replaced by another character. Examples: > > * lower-case alpha (ɒ) 0x0252 becomes 0x52 (R) > * ligature OE (œ) 0x0153 becomes 0x53 (S) > * hangul letter pieup (ㅂ) 0x3142 becomes 0x42 (B) > * upper-case gamma (Ɣ) 0x0194 becomes 0x94 (not printable) so gets > replaced by "!" > > The result of this conversion for the GPT partition name is passed to > user-space as PARTNAME via udev, which is confusing and feels questionable. > > However, there is a flaw in the conversion function itself. By dropping > one byte of each character and using isprint() to check if the remaining > byte corresponds to a printable character, we do not actually guarantee > that the resulting character is 7-bit ASCII. > > This happens because we pass 8-bit characters to isprint(), which > in the kernel returns 1 for many values > 0x7f - as defined in ctype.c. > > This results in many values which should be replaced by "!" to be kept > as-is, despite not being valid 7-bit ASCII. Examples: > > * e with acute accent (é) 0x00E9 becomes 0xE9 - kept as-is because > isprint(0xE9) returns 1. > * euro sign (€) 0x20AC becomes 0xAC - kept as-is because isprint(0xAC) > returns 1. > > Fixed by using a mask of 7 bits instead of 8 bits before calling > isprint. > > Signed-off-by: Olivier Gayot <olivier.gayot@xxxxxxxxxxxxx> > --- > block/partitions/efi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/block/partitions/efi.c b/block/partitions/efi.c > index 5e9be13a56a8..7acba66eed48 100644 > --- a/block/partitions/efi.c > +++ b/block/partitions/efi.c > @@ -682,7 +682,7 @@ static void utf16_le_to_7bit(const __le16 *in, unsigned int size, u8 *out) > out[size] = 0; > > while (i < size) { > - u8 c = le16_to_cpu(in[i]) & 0xff; > + u8 c = le16_to_cpu(in[i]) & 0x7f; > > if (c && !isprint(c)) > c = '!'; Hello Olivier, Looks like you didn't Cc linux-block maillist and Jens, can you re-send the patch to linux-block for review? Thanks, Ming