On Feb 13, 2025, at 1:10 PM, Theodore Ts'o <tytso@xxxxxxx> wrote: > > This patch addresses an issue where some files in case-insensitive > directories become inaccessible due to changes in how the kernel > function, utf8_casefold(), generates case-folded strings from the > commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code > points"). > > There are good reasons why this change should be made; it's actually > quite stupid that Unicode seems to think that the characters ❤ and ❤️ > should be casefolded. Unfortimately because of the backwards > compatibility issue, this commit was reverted in 231825b2e1ff. > > This problem is addressed by instituting a brute-force linear fallback > if a lookup fails on case-folded directory, which does result in a > performance hit when looking up files affected by the changing how > thekernel treats ignorable Uniode characters, or when attempting to > look up non-existent file names. So this fallback can be disabled by > setting an encoding flag if in the future, the system administrator or > the manufacturer of a mobile handset or tablet can be sure that there > was no opportunity for a kernel to insert file names with incompatible > encodings. I don't have the full context here, but falling back to a full directory scan for every failed lookup in a casefolded directory would be *very* expensive for a large directory. This could be made conditional upon a much narrower set of conditions: - if the filename has non-ASCII characters (already uncommon) - if the filename contains characters that may be case folded (normalized?) This avoids a huge performance hit for every name lookup in very common workloads that do not need it (i.e. most computer-generated filenames are still only using ASCII characters). Also, depending on the size of the directory vs. the number of case-folded (normalized?) characters in the filename, it might be faster to do 2^(ambiguous_chars) htree lookups instead of a linear scan of the whole dir. That could be checked easily whether 2^(ambiguous_chars) < dir blocks, since the htree leaf blocks will always be fully scanned anyway once found. That could be a big win if there are only a few remapped characters in a filename. Cheers, Andreas > > Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") > Signed-off-by: Theodore Ts'o <tytso@xxxxxxx> > Reviewed-by: Gabriel Krisman Bertazi <krisman@xxxxxxx> > --- > v2: > * Fix compile failure when CONFIG_UNICODE is not enabled > * Added reviewed-by from Gabriel Krisman > > fs/ext4/namei.c | 14 ++++++++++---- > include/linux/fs.h | 10 +++++++++- > 2 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 536d56d15072..820e7ab7f3a3 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent, > * sure cf_name was properly initialized before > * considering the calculated hash. > */ > - if (IS_ENCRYPTED(parent) && fname->cf_name.name && > + if (sb_no_casefold_compat_fallback(parent->i_sb) && > + IS_ENCRYPTED(parent) && fname->cf_name.name && > (fname->hinfo.hash != EXT4_DIRENT_HASH(de) || > fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de))) > return false; > @@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, > * return. Otherwise, fall back to doing a search the > * old fashioned way. > */ > - if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR) > + if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR) > + dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " > + "falling back\n")); > + else if (!sb_no_casefold_compat_fallback(dir->i_sb) && > + *res_dir == NULL && IS_CASEFOLDED(dir)) > + dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold " > + "failed, falling back\n")); > + else > goto cleanup_and_exit; > - dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " > - "falling back\n")); > ret = NULL; > } > nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 2c3b2f8a621f..aa4ec39202c3 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1258,11 +1258,19 @@ extern int send_sigurg(struct file *file); > #define SB_NOUSER BIT(31) > > /* These flags relate to encoding and casefolding */ > -#define SB_ENC_STRICT_MODE_FL (1 << 0) > +#define SB_ENC_STRICT_MODE_FL (1 << 0) > +#define SB_ENC_NO_COMPAT_FALLBACK_FL (1 << 1) > > #define sb_has_strict_encoding(sb) \ > (sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL) > > +#if IS_ENABLED(CONFIG_UNICODE) > +#define sb_no_casefold_compat_fallback(sb) \ > + (sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL) > +#else > +#define sb_no_casefold_compat_fallback(sb) (1) > +#endif > + > /* > * Umount options > */ > -- > 2.45.2 > > Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP