Re: RFC: Case-insensitive support for XFS

"Barry Naujok" <bnaujok@xxxxxxx> · Mon, 08 Oct 2007 10:43:08 +1000

On Sat, 06 Oct 2007 05:10:23 +1000, Anton Altaparmakov <aia21@xxxxxxxxx>  
wrote:

Hi,

On 5 Oct 2007, at 16:44, Christoph Hellwig wrote:
[Adding -fsdevel because some of the things touched here might be of
 broader interest and Urban because his name is on nls_utf8.c]

On Fri, Oct 05, 2007 at 11:57:54AM +1000, Barry Naujok wrote:
On it's own, linux only provides case conversion for old-style
character sets - 8 bit sequences only. A lot of distos are
now defaulting to UTF-8 and Linux NLS stuff does not support
case conversion for any unicode sets.

The lack of case tables in nls_utf8.c defintively seems odd to me.
Urban, is there a reason for that?  The only thing that comes to
mind is that these tables might be quite large.

NTFS in Linux also implements it's own dcache and NTFS also

					^^^^^^^ dentry operations?

Where did that come from?  NTFS does not have its own dcache!  It  
doesn't have its own dentry operations either...  NTFS uses the default  
ones...

All the case insensitivity handling "cleverness" is done inside  
ntfs_lookup(), i.e. the NTFS directory inode operation ->lookup.

Sorry if I got this wrong. I derived my comment from fs/ntfs/namei.c:

 * In order to handle the case insensitivity issues of NTFS with regards  
to the
 * dcache and the dcache requiring only one dentry per directory, we deal  
with
 * dentry aliases that only differ in case in ->ntfs_lookup() while  
maintaining
 * a case sensitive dcache.

Misinterpretation reading it again :)

Internally, the names will probably be converted to "u16"s for
efficient processing. Conversion between UTF-8 and UTF-16/UCS-2
is very straight forward.

Do we really need that?  And if so please make sure this only happens
for filesystems created with the case insensitivity option so normal
filesystems don't have to pay for these bloated strings.

There is nothing efficient about using u16 in memory AFAIK.  In fact for  
majority of the time it just means you use twice the memory per string...

FWIW Mac OS X uses utf8 in the kernel and so does HFS(+) and I can't see  
anything wrong with that.  And Windows uses u16 (little endian) and so  
does NTFS.  So there is precedent for doing both internally...

What are the reasons for suggesting that it would be more efficient to  
use u16 internally?

As I said to Christoph before, the only reason is the nls conversions
use wchar_t. As I don't have any case tables yet (one of the primary
points for discussion), I haven't settled on which method to use.

If I do use u16, it will only be used temporarily for case comparison.

Regards,
barry.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html