On Tuesday 07 January 2020 14:32:33 Jan Kara wrote: > On Thu 02-01-20 22:18:55, Pali Rohár wrote: > > 1) Unify mount options for specifying charset. > > > > Currently all filesystems except msdos and hfsplus have mount option > > iocharset=<charset>. hfsplus has nls=<charset> and msdos does not > > implement re-encoding support. Plus vfat, udf and isofs have broken > > iocharset=utf8 option (but working utf8 option) And ntfs has deprecated > > iocharset=<charset> option. > > > > I would suggest following changes for unification: > > > > * Add a new alias iocharset= for hfsplus which would do same as nls= > > * Make iocharset=utf8 option for vfat, udf and isofs to do same as utf8 > > * Un-deprecate iocharset=<charset> option for ntfs > > > > This would cause that all filesystems would have iocharset=<charset> > > option which would work for any charset, including iocharset=utf8. > > And it would fix also broken iocharset=utf8 for vfat, udf and isofs. > > Makes sense to me. Ok! > > 2) Add support for Unicode code points above U+FFFF for filesystems > > befs, hfs, hfsplus, jfs and ntfs, so iocharset=utf8 option would work > > also with filenames in userspace which would be 4 bytes long UTF-8. > > Also looks good but when doing this, I'd suggest we extend NLS to support > full UTF-8 rather than implementing it by hand like e.g. we did for UDF. Current kernel NLS framework API supports upper-case / lower-case conversion only for single byte encodings. So no case-insensitive support for UTF-8 encoding. And for Unicode conversion it supports only UCS-2, therefore code points up to the U+FFFF, so for UTF-8 maximally 3byte long sequences. This really is not possible to fix without rewriting existing filesystems which uses NLS API. One hacky option would be to extend NLS API from UCS-2 to UTF-16 and fix all users of NLS API to expects UTF-16 surrogate pairs. But I dislike UTF-16 and rather would use usage of unicode_t (UTF-32) which is already present in kernel. But because existing filesystems drivers pass their UCS-2/UTF-16 buffers from FS to NLS API it is not easy to change whole NLS API from UCS-2 to UTF-32. And still this change does not add support for case-insensitivity, so is useless for all MS filesystems (msdos, vfat, ntfs), which is majority. Kernel already provides functions for converting between UTF-8 and UTF-16, so this seems to be the easiest way how to provide full UTF-8 support for filesystems which internally uses UTF-16. Similarly like it is implemented in UDF. Moreover all NLS encodings except UTF-8 are single byte encodings and maps into Plane-0, so can be represented by currently used UCS-2 encoding. Therefore conversion to Unicode works correctly and also their case-insensitivity functions (or rather tables). Adding support for case-insensitivity into UTF-8 NLS encoding would mean to create completely new kernel NLS API (which would support variable length encodings) and rewrite all NLS filesystems to use this new API. Also all existing NLS encodings would be needed to port into this new API. It is really something which have a value? Just because of UTF-8? For me it looks like better option would be to remove UTF-8 NLS encoding as it is broken. Some filesystems already do not use NLS API for their UTF-8 support (e.g. vfat, udf or newly prepared exfat). And others could be modified/extended/fixed in similar way. > > 3) Add support for iocharset= and codepage= options for msdos > > filesystem. It shares lot of pars of code with vfat driver. > > I guess this is for msdos filesystem maintainers to decide. Yes! -- Pali Rohár pali.rohar@xxxxxxxxx
Attachment:
signature.asc
Description: PGP signature