Re: [PATCH 0/3] use UTF-8 encoding

Greg KH <greg@xxxxxxxxx> · Fri, 24 Apr 2009 15:32:10 -0700

On Fri, Apr 24, 2009 at 10:10:46AM +0200, Clemens Ladisch wrote:
> Alan Stern wrote:
> > It is feasible, but there are a couple of things to watch out for:
> > 
> > 	With latin-1 encoding we know that each character occupies
> > 	only one byte; therefore any descriptor string will fit into a 
> > 	128-byte buffer (since the total descriptor length can't be 
> > 	larger than 255).  But with UTF-8 encoding, a character can 
> > 	occupy more than one byte.  Hence the callers may need to 
> > 	allocate larger buffers than they do now.  For instance, you 
> > 	would definitely want to change usb_cache_string().
> 
> That one is the only caller of usb_string() in the kernel that uses a
> buffer larger than 64 bytes, so I didn't bother about the others.
> 
> > 	Translation from UTF-16LE to latin-1 is easy.  Translation
> > 	to UTF-8 is harder because it requires you to check for
> > 	invalid code points.  Furthermore, if you write your own code
> > 	to do the translation then you are almost certainly duplicating 
> > 	code that already exists somewhere else in the kernel, which is 
> > 	a bad idea.
> 
> The only existing code I've found is utf8_wcstombs(), and it doesn't
> bother about invalid code points.
> 
> I've included the NLS patches here because there doesn't seem to be an
> NLS maintainer, and you wouldn't want to use the USB patch without those
> fixes.
> 
> Not much tested, because I don't have a USB device with non-ASCII
> strings.  And I'm not quite sure how applications will handle the
> encoding change ...

Hm, I have a device with an extended ascii string:

$ cat /sys/kernel/debug/usb/devices  | grep Track
S:  Product=Microsoft Trackball Optical�

so I'll try them out.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html