[PATCH 0/3] use UTF-8 encoding

Clemens Ladisch <clemens@xxxxxxxxxx> · Fri, 24 Apr 2009 10:10:46 +0200

Alan Stern wrote:
> It is feasible, but there are a couple of things to watch out for:
> 
> 	With latin-1 encoding we know that each character occupies
> 	only one byte; therefore any descriptor string will fit into a 
> 	128-byte buffer (since the total descriptor length can't be 
> 	larger than 255).  But with UTF-8 encoding, a character can 
> 	occupy more than one byte.  Hence the callers may need to 
> 	allocate larger buffers than they do now.  For instance, you 
> 	would definitely want to change usb_cache_string().

That one is the only caller of usb_string() in the kernel that uses a
buffer larger than 64 bytes, so I didn't bother about the others.

> 	Translation from UTF-16LE to latin-1 is easy.  Translation
> 	to UTF-8 is harder because it requires you to check for
> 	invalid code points.  Furthermore, if you write your own code
> 	to do the translation then you are almost certainly duplicating 
> 	code that already exists somewhere else in the kernel, which is 
> 	a bad idea.

The only existing code I've found is utf8_wcstombs(), and it doesn't
bother about invalid code points.

I've included the NLS patches here because there doesn't seem to be an
NLS maintainer, and you wouldn't want to use the USB patch without those
fixes.

Not much tested, because I don't have a USB device with non-ASCII
strings.  And I'm not quite sure how applications will handle the
encoding change ...

 drivers/usb/Kconfig        |    1 
 drivers/usb/core/message.c |   41 +++++++++++++++----------------------
 fs/nls/nls_base.c          |    2 -
 3 files changed, 19 insertions(+), 25 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html