Introduces the encoding-awareness feature for ext4, explains some of the design decisions and the mount options to enabled it. Signed-off-by: Gabriel Krisman Bertazi <krisman@xxxxxxxxxxxxxxx> --- Documentation/filesystems/ext4.txt | 37 ++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 7f628b9f7c4b..57ce78c18b26 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -99,6 +99,8 @@ Note: More extensive information for getting started with ext4 can be * large block (up to pagesize) support * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force the ordering) +* Encoding aware file names +* Case insensitive file name lookups [1] Filesystems with a block size of 1k may see a limit imposed by the directory hash tree having a maximum depth of two. @@ -122,6 +124,32 @@ grouping of bitmaps and inode tables. Some test results available here: - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html +2.3 Encoding-aware file names and case-insensitive lookups +========================================================== + +Ext4 optionally supports filesystem-wide charset knowledge when handling +file names, which allows the user to perform file system lookups using +charset equivalent versions of the same file name, and optionally ensure +that no invalid names are held by the filesystem. charset encoding +awareness is also essential for performing case-insensitive lookups, +because it is what defines the casefold operation. + +The case-insensitive file name lookup feature is supported in a smaller +granularity, on a per-directory basis, allowing the user to mix +case-insensitive and case-sensitive directories in the same filesystem. +It is enabled by flipping a file attribute on an empty directory. For +the reason stated above, the filesystem must have encoding enabled to +use this feature. + +When we change from filenames as opaque byte sequences to seeing them as +encoded strings we need to address what happens when a program tries to +create a file with an invalid name. The Natural Language System within +the kernel leaves the decision of what to do to the filesystem, via +configuring the NLS strict mode. When Ext4 encounters one of those +strings, it falls back to considering the entire string as one opaque +byte sequence, which still allows the user to operate on that file but +the case-insensitive and equivalent sequence lookups won't work. + 3. Options ========== @@ -388,6 +416,15 @@ dax Use direct access (no page cache). See Documentation/filesystems/dax.txt. Note that this option is incompatible with data=journal. +encoding Enable a specific encoding for file name lookups. + This cannot be used with per-directory encryption and + will fail on filesystems that have that flag enabled. + +encoding_flags A bitmask to configure how the encoding aware mechanism + should function. It specifies whether to refuse invalid + sequences and the specific normalization and casefold + operations to use. + Data Mode ========= There are 3 different data modes: -- 2.18.0