Re: [PATCH v9 10/13] exfat: add nls operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday 03 January 2020 10:40:30 Pali Rohár wrote:
> On Thursday 02 January 2020 16:20:33 Namjae Jeon wrote:
> > This adds the implementation of nls operations for exfat.
> > 
> > Signed-off-by: Namjae Jeon <namjae.jeon@xxxxxxxxxxx>
> > Signed-off-by: Sungjong Seo <sj1557.seo@xxxxxxxxxxx>
> > ---
> >  fs/exfat/nls.c | 809 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 809 insertions(+)
> >  create mode 100644 fs/exfat/nls.c
> > 
> > diff --git a/fs/exfat/nls.c b/fs/exfat/nls.c
> > new file mode 100644
> > index 000000000000..af52328e28ff
> > --- /dev/null
> > +++ b/fs/exfat/nls.c
> 
> ...
> 
> > +static int exfat_convert_uni_to_ch(struct nls_table *nls, unsigned short uni,
> > +		unsigned char *ch, int *lossy)
> > +{
> > +	int len;
> > +
> > +	ch[0] = 0x0;
> > +
> > +	if (uni < 0x0080) {
> > +		ch[0] = uni;
> > +		return 1;
> > +	}
> > +
> > +	len = nls->uni2char(uni, ch, MAX_CHARSET_SIZE);
> > +	if (len < 0) {
> > +		/* conversion failed */
> > +		if (lossy != NULL)
> > +			*lossy |= NLS_NAME_LOSSY;
> > +		ch[0] = '_';
> > +		return 1;
> > +	}
> > +	return len;
> > +}
> 
> Hello! This function takes one UCS-2 character in host endianity and
> converts it to one byte (via specified 8bit encoding).
> 
> > +static int __exfat_nls_uni16s_to_vfsname(struct super_block *sb,
> > +		struct exfat_uni_name *p_uniname, unsigned char *p_cstring,
> > +		int buflen)
> > +{
> > +	int i, j, len, out_len = 0;
> > +	unsigned char buf[MAX_CHARSET_SIZE];
> > +	const unsigned short *uniname = p_uniname->name;
> > +	struct nls_table *nls = EXFAT_SB(sb)->nls_io;
> > +
> > +	i = 0;
> > +	while (i < MAX_NAME_LENGTH && out_len < (buflen - 1)) {
> > +		if (*uniname == '\0')
> > +			break;
> > +
> > +		len = exfat_convert_uni_to_ch(nls, *uniname, buf, NULL);
> > +		if (out_len + len >= buflen)
> > +			len = buflen - 1 - out_len;
> > +		out_len += len;
> > +
> > +		if (len > 1) {
> > +			for (j = 0; j < len; j++)
> > +				*p_cstring++ = buf[j];
> > +		} else { /* len == 1 */
> > +			*p_cstring++ = *buf;
> > +		}
> > +
> > +		uniname++;
> > +		i++;
> > +	}
> > +
> > +	*p_cstring = '\0';
> > +	return out_len;
> > +}
> > +
> 
> This function takes UCS-2 buffer in host endianity and converts it to
> string in specified 8bit encoding.
> 
> > +
> > +int exfat_nls_uni16s_to_vfsname(struct super_block *sb,
> > +		struct exfat_uni_name *uniname, unsigned char *p_cstring,
> > +		int buflen)
> > +{
> 
> Looking at the code and this function is called from dir.c to translate
> exfat filename buffer stored in filesystem to format expected by VFS
> layer.
> 
> On exfat filesystem file names are always stored in UTF-16LE...
> 
> > +	if (EXFAT_SB(sb)->options.utf8)
> > +		return __exfat_nls_utf16s_to_vfsname(sb, uniname, p_cstring,
> > +				buflen);
> > +	return __exfat_nls_uni16s_to_vfsname(sb, uniname, p_cstring, buflen);
> 
> ... and therefore above "__exfat_nls_uni16s_to_vfsname" function must
> expect UTF-16LE buffer and not just UCS-2 buffer in host endianity.
> 
> So two other things needs to be done: Convert character from little
> endian to host endianity and then process UTF-16 buffer and not only
> UCS-2.
> 
> I see that in kernel NLS module is missing a function for converting
> UTF-16 string to UTF-32 (encoding in which every code point is
> represented just by one u32 variable). Kernel has only utf16s_to_utf8s()
> and utf8_to_utf32().

What about just filtering two u16 (one surrogate pair)? Existing NLS
modules do not support code points above U+FFFF so two u16 (one
surrogate pair) just needs to be converted to one replacement character.

diff --git a/fs/exfat/nls.c b/fs/exfat/nls.c
index 81d75aed9..f626a0a89 100644
--- a/fs/exfat/nls.c
+++ b/fs/exfat/nls.c
@@ -545,7 +545,10 @@ static int __exfat_nls_vfsname_to_utf16s(struct super_block *sb,
 	return unilen;
 }
 
-static int __exfat_nls_uni16s_to_vfsname(struct super_block *sb,
+#define SURROGATE_PAIR		0x0000d800
+#define SURROGATE_LOW		0x00000400
+
+static int __exfat_nls_utf16s_to_vfsname(struct super_block *sb,
 		struct exfat_uni_name *p_uniname, unsigned char *p_cstring,
 		int buflen)
 {
@@ -559,7 +562,23 @@ static int __exfat_nls_uni16s_to_vfsname(struct super_block *sb,
 		if (*uniname == '\0')
 			break;
 
-		len = exfat_convert_uni_to_ch(nls, *uniname, buf, NULL);
+		if ((*uniname & SURROGATE_MASK) != SURROGATE_PAIR) {
+			len = exfat_convert_uni_to_ch(nls, *uniname, buf, NULL);
+		} else {
+			/* Process UTF-16 surrogate pair as one character */
+			if (!(*uniname & SURROGATE_LOW) && i+1 < MAX_NAME_LENGTH &&
+			    (*(uniname+1) & SURROGATE_MASK) == SURROGATE_PAIR &&
+			    (*(uniname+1) & SURROGATE_LOW)) {
+				uniname++;
+				i++;
+			}
+			/* UTF-16 surrogate pair encodes code points above Ux+FFFF.
+			 * Code points above U+FFFF are not supported by kernel NLS
+			 * framework therefore use replacement character */
+			len = 1;
+			buf[0] = '_';
+		}
+
 		if (out_len + len >= buflen)
 			len = buflen - 1 - out_len;
 		out_len += len;
@@ -623,7 +642,7 @@ int exfat_nls_uni16s_to_vfsname(struct super_block *sb,
 	if (EXFAT_SB(sb)->options.utf8)
 		return __exfat_nls_utf16s_to_vfsname(sb, uniname, p_cstring,
 				buflen);
-	return __exfat_nls_uni16s_to_vfsname(sb, uniname, p_cstring, buflen);
+	return __exfat_nls_utf16s_to_vfsname(sb, uniname, p_cstring, buflen);
 }
 
 int exfat_nls_vfsname_to_uni16s(struct super_block *sb,

I have not tested this code, it is just an idea how to quick & dirty
solve this problem that NLS framework works with UCS-2 encoding and
UCS-4/UTF-32 or UTF-16.

> > +}
> 
> Btw, have you tested this exfat implementation on some big endian
> system? I think it cannot work because of missing conversion from
> UTF-16LE to UTF-16 in host endianity (therefore UTF-16BE).

Now I figured out that conversion from UTF-16LE to UTF-16 host endianity
is already done in exfat_extract_uni_name() function, called from
exfat_get_uniname_from_ext_entry() function. exfat_nls_uni16s_to_vfsname
is then called on result from exfat_get_uniname_from_ext_entry(), so
UTF-16LE processing on big endian systems should work. Sorry for that.

-- 
Pali Rohár
pali.rohar@xxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux