Re: ALPHA: advice for a patch to CIFS

Jeff Layton <jlayton@xxxxxxxxx> · Sat, 28 Jul 2012 08:06:51 -0400

On Fri, 27 Jul 2012 20:13:42 +0100
Frediano Ziglio <frediano.ziglio@xxxxxxxxxx> wrote:

> Hi,
>   I'm currently trying to support utf-16 with characters not in plane 0.
> 
> I'm currently end up with this patch. Currently is not against latest
> kernel but the problem still reside in last git kernel.
> 
> wchar_t is currently 16bit so converting a utf8 encoded characters not
> in plane 0 (>= 0x10000) to wchar_t (that is calling char2uni) lead to a
> -EINVAL return. This patch detect utf8 in cifs_strtoUCS and add special
> code calling directly utf8_to_utf32.
> 
> Does it sound a good patch or just a bad hack. Perhaps would be better
> to change char2uni converting to unicode_t (32bit) instead of wchar_t
> but probably many code have to be checked in order to make sure it does
> not lead to wrong conversions, overflows or other bad stuff.
> 
> Is it worth working in this hacking way? I'd like to upstream this
> patch.
> 
> 
> diff -r c2325d754e8d fs/cifs/cifs_unicode.c
> --- a/fs/cifs/cifs_unicode.c	Fri Jul 27 15:12:23 2012 +0100
> +++ b/fs/cifs/cifs_unicode.c	Fri Jul 27 19:09:04 2012 +0100
> @@ -192,22 +192,40 @@ cifs_strtoUCS(__le16 *to, const char *fr

That function doesn't exist anymore. You should base this on a more
recent upstream tree.

>  {
>  	int charlen;
>  	int i;
> -	wchar_t *wchar_to = (wchar_t *)to; /* needed to quiet sparse */
> +	int is_utf8 = !strcmp(codepage->charset, "utf8");

Gross...there must be a better way to do that?

> +	wchar_t wchar_to; /* needed to quiet sparse */
> +	unicode_t uni;
>  
>  	for (i = 0; len && *from; i++, from += charlen, len -= charlen) {
>  
>  		/* works for 2.4.0 kernel or later */
> -		charlen = codepage->char2uni(from, len, &wchar_to[i]);
> +		if (is_utf8) {
> +			charlen = utf8_to_utf32(from, len, &uni);
> +		} else {
> +			charlen = codepage->char2uni(from, len, &wchar_to);
> +			uni = wchar_to;
> +		}
> +
>  		if (charlen < 1) {
>  			cERROR(1,
>  			       ("strtoUCS: char2uni of %d returned %d",
>  				(int)*from, charlen));
>  			/* A question mark */
> -			to[i] = cpu_to_le16(0x003f);
> +			wchar_to = 0x003f;
>  			charlen = 1;
> -		} else
> -			to[i] = cpu_to_le16(wchar_to[i]);
> -
> +		} else if (uni < 0x10000) {

	"uni" will be unintialized here if is_utf8 is false.

> +			wchar_to = uni;
> +		} else if (uni < 0x110000) {
> +			uni -= 0x10000;
> +			to[i++] = cpu_to_le16(0xD800 | (uni >> 10));
> +			wchar_to = 0xDC00 | (uni & 0x3FF);
> +		} else {
> +			cERROR(1,
> +			       ("strtoUCS: char2uni of %d returned %d",
> +				(int)*from, charlen));
> +			wchar_to = 0x003f;
> +		}
> +		to[i] = cpu_to_le16(wchar_to);
>  	}
>  
>  	to[i] = 0;
> 
> Signed-off-by: "Frediano Ziglio" <frediano.ziglio@xxxxxxxxxx>
> 
> Regards,
>   Frediano
> 

The basic idea looks ok, but I agree that this could use some more
commenting and/or some #define'd constants. Does the conversion of on
the wire characters to the local charset need similar work?

-- 
Jeff Layton <jlayton@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html