On Wed, Sep 18, 2013 at 09:48:44PM -0700, Roy Franz wrote: > On Wed, Sep 18, 2013 at 8:44 PM, Adam Borowski <kilobyte@xxxxxxxxxx> wrote: > > [UCS2 truncation] > > I stuck to re-arranging the code that was there, as I don't know enough > about character encodings to propose changes. I on the other hand don't know the kernel (lurking because of my first patch), but I'm on a crusade against mangled Unicode (so far in the userland). Can't let such a blatant error slip through on my watch :) > Also, this code is running as part of the kernel decompressor, rather than > the kernel itself, so it doesn't have access to any kernel facilities, and > it also needs to be position independent. Ok, so it can't reuse common libraries. No problem, a simplified, sanitized and optimized copy of utf16s_to_utf8s() can be done in quite less code than the original. > It's running in a quite limited environment - the decompressor has > its own copy of strstr(), and other string functions. I'd need nothing but a way to alloc the new string. And I see this is already done (efi_{low,high_alloc()). > I checked the UEFI specification, and it states that all 16 bit strings > are UCS-2, unless otherwise noted. ... which means it will either get upgraded to UTF-16 in a subsequent version, or some Unicode strings get mangled. I'd ignore this bit and implement full UTF-16 from the start: every legal UCS-2 string can be decoded as UTF-16 so it's a strict superset. > The load options that the command line is provided through a void pointer > specified as: [snip] Either a null pointer or a 16-bit string, that sounds clear enough. I see not a word about endianness (does anything do EFI on big endian?), but "same as host" seems to be a reasonable assumption. > Would it be acceptable to fix the naming/comments, and convert values > above 126 to '?' in the current patchset, and address a more thorough fix > in another patch set? The ARM and ARM64 EFI stub patchsets that are > mostly complete depend on this one, so getting this merged soon would be > helpful. I don't want to hinder your work, so what about putting in your version as-is and fixing it later? > > There's just one problem: which encoding to use, but > > these days, most distributions have either dropped non-UTF8 or hardly pay > > lip service, so we could get away with hard-coding UTF-8: those few who > > use ancient charsets can stick to ASCII. Not being able to use regular kernel facilities makes supporting ancient charsets a lost cause. I'm so weeping about them... not. > I would certainly appreciate your help improving this Are we on the same page so far? If so, I can make a patch atop yours. -- ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html