Re: [PATCH -v2] ext4: introduce linear search for dentries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Feb 13, 2025, at 1:10 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> 
> This patch addresses an issue where some files in case-insensitive
> directories become inaccessible due to changes in how the kernel
> function, utf8_casefold(), generates case-folded strings from the
> commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code
> points").
> 
> There are good reasons why this change should be made; it's actually
> quite stupid that Unicode seems to think that the characters ❤ and ❤️
> should be casefolded.  Unfortimately because of the backwards
> compatibility issue, this commit was reverted in 231825b2e1ff.
> 
> This problem is addressed by instituting a brute-force linear fallback
> if a lookup fails on case-folded directory, which does result in a
> performance hit when looking up files affected by the changing how
> thekernel treats ignorable Uniode characters, or when attempting to
> look up non-existent file names.  So this fallback can be disabled by
> setting an encoding flag if in the future, the system administrator or
> the manufacturer of a mobile handset or tablet can be sure that there
> was no opportunity for a kernel to insert file names with incompatible
> encodings.

I don't have the full context here, but falling back to a full directory
scan for every failed lookup in a casefolded directory would be *very*
expensive for a large directory.

This could be made conditional upon a much narrower set of conditions:
- if the filename has non-ASCII characters (already uncommon)
- if the filename contains characters that may be case folded (normalized?)

This avoids a huge performance hit for every name lookup in very common
workloads that do not need it (i.e. most computer-generated filenames are
still only using ASCII characters).

Also, depending on the size of the directory vs. the number of case-folded
(normalized?) characters in the filename, it might be faster to do
2^(ambiguous_chars) htree lookups instead of a linear scan of the whole dir.

That could be checked easily whether 2^(ambiguous_chars) < dir blocks, since
the htree leaf blocks will always be fully scanned anyway once found.  That
could be a big win if there are only a few remapped characters in a filename.

Cheers, Andreas

> 
> Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points")
> Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>
> Reviewed-by: Gabriel Krisman Bertazi <krisman@xxxxxxx>
> ---
> v2:
>   * Fix compile failure when CONFIG_UNICODE is not enabled
>   * Added reviewed-by from Gabriel Krisman
> 
> fs/ext4/namei.c    | 14 ++++++++++----
> include/linux/fs.h | 10 +++++++++-
> 2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 536d56d15072..820e7ab7f3a3 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent,
> 		 * sure cf_name was properly initialized before
> 		 * considering the calculated hash.
> 		 */
> -		if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
> +		if (sb_no_casefold_compat_fallback(parent->i_sb) &&
> +		    IS_ENCRYPTED(parent) && fname->cf_name.name &&
> 		    (fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
> 		     fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
> 			return false;
> @@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir,
> 		 * return.  Otherwise, fall back to doing a search the
> 		 * old fashioned way.
> 		 */
> -		if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR)
> +		if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR)
> +			dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, "
> +				       "falling back\n"));
> +		else if (!sb_no_casefold_compat_fallback(dir->i_sb) &&
> +			 *res_dir == NULL && IS_CASEFOLDED(dir))
> +			dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold "
> +				       "failed, falling back\n"));
> +		else
> 			goto cleanup_and_exit;
> -		dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, "
> -			       "falling back\n"));
> 		ret = NULL;
> 	}
> 	nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 2c3b2f8a621f..aa4ec39202c3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1258,11 +1258,19 @@ extern int send_sigurg(struct file *file);
> #define SB_NOUSER       BIT(31)
> 
> /* These flags relate to encoding and casefolding */
> -#define SB_ENC_STRICT_MODE_FL	(1 << 0)
> +#define SB_ENC_STRICT_MODE_FL		(1 << 0)
> +#define SB_ENC_NO_COMPAT_FALLBACK_FL	(1 << 1)
> 
> #define sb_has_strict_encoding(sb) \
> 	(sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL)
> 
> +#if IS_ENABLED(CONFIG_UNICODE)
> +#define sb_no_casefold_compat_fallback(sb) \
> +	(sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL)
> +#else
> +#define sb_no_casefold_compat_fallback(sb) (1)
> +#endif
> +
> /*
>  *	Umount options
>  */
> --
> 2.45.2
> 
> 


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux