To: Pali Rohár <pali@xxxxxxxxxx> Cc: linux-fsdevel@xxxxxxxxxxxxxxx, linux-ntfs-dev@xxxxxxxxxxxxxxxxxxxxx, linux-cifs@xxxxxxxxxxxxxxx, jfs-discussion@xxxxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, "Theodore Y . Ts'o" <tytso@xxxxxxx>, Luis de Bethencourt <luisbg@xxxxxxxxxx>, Salah Triki <salah.triki@xxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Dave Kleikamp <shaggy@xxxxxxxxxx>, Anton Altaparmakov <anton@xxxxxxxxxx>, Pavel Machek <pavel@xxxxxx>, Marek Behún <marek.behun@xxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx> Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option From: OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> Gcc: nnimap+ibmpc.myhome.or.jp:Sent --text follows this line-- To: Pali Rohár <pali@xxxxxxxxxx> Cc: linux-fsdevel@xxxxxxxxxxxxxxx, linux-ntfs-dev@xxxxxxxxxxxxxxxxxxxxx, linux-cifs@xxxxxxxxxxxxxxx, jfs-discussion@xxxxxxxxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Jan Kara <jack@xxxxxxx>, "Theodore Y . Ts'o" <tytso@xxxxxxx>, Luis de Bethencourt <luisbg@xxxxxxxxxx>, Salah Triki <salah.triki@xxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Dave Kleikamp <shaggy@xxxxxxxxxx>, Anton Altaparmakov <anton@xxxxxxxxxx>, Pavel Machek <pavel@xxxxxx>, Marek Behún <marek.behun@xxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx> Subject: Re: [RFC PATCH 01/20] fat: Fix iocharset=utf8 mount option From: OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> Gcc: nnimap+ibmpc.myhome.or.jp:Sent --text follows this line-- Pali Rohár <pali@xxxxxxxxxx> writes: >> This change is not equivalent to utf8=1. In the case of utf8=1, vfat >> uses iocharset's conversion table and it can handle more than ascii. >> >> So this patch is incompatible changes, and handles less chars than >> utf8=1. So I think this is clean though, but this would be regression >> for user of utf8=1. > > I do not think so... But please correct me, as this code around is mess. > > Without this change when utf8=1 is set then iocharset= encoding is used > for case-insensitivity implementation (toupper / tolower conversion). > For all other parts are use correct utf8* conversion functions. > > But you use touppper / tolower functions from iocharset= encoding on > stream of utf8 bytes then you either get identity or some unpredictable > garbage in utf8. So when comparing two (different) non-ASCII filenames > via this method you in most cases get that filenames are different. > Because converting their utf8 bytes via toupper / tolower functions from > iocharset= encoding results in two different byte sequences in most > cases. Even for two utf8 case-insensitive same strings. > > But you can play with it and I guess it is possible to find two > different utf8 strings which after toupper / tolower conversion from > some iocharset= encoding would lead to same byte sequence. > > This patch uses for utf8 tolower / touppser function simple 7-bit > tolower / toupper ascii function. And so for 7-bit ascii file names > there is no change. > > So this patch changes behavior when comparing non 7-bit ascii file > names, but only in cases when previously two different file names were > marked as same. As now they are marked correctly as different. So this > is changed behavior, but I guess it is bug fix which is needed. > If you want I can put this change into separate patch. > > Issue that two case-insensitive same files are marked as different is > not changed by this patch and therefore this issue stay here. OK, sure. utf8 looks like broken than I was thinking (although user can use iocharset=ascii and utf8=1 for this). The code might be better to clean up a bit more though, looks like good basically. One thing, please update FAT_DEFAULT_IOCHARSET help in Kconfig and Documentation/filesystems/vfat.rst (with new warning about iocharset=utf8). Thanks. -- OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>