Re: [PATCH v3 2/2] f2fs: Support case-insensitive file name lookups

Chao Yu <yuchao0@xxxxxxxxxx> · Fri, 19 Jul 2019 10:11:01 +0800

On 2019/7/19 8:03, Daniel Rosenberg wrote:
> Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file
> name lookups")
> 
> """
> This patch implements the actual support for case-insensitive file name
> lookups in f2fs, based on the feature bit and the encoding stored in the
> superblock.
> 
> A filesystem that has the casefold feature set is able to configure
> directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
> to succeed in that directory in a case-insensitive fashion, i.e: match
> a directory entry even if the name used by userspace is not a byte per
> byte match with the disk name, but is an equivalent case-insensitive
> version of the Unicode string.  This operation is called a
> case-insensitive file name lookup.
> 
> The feature is configured as an inode attribute applied to directories
> and inherited by its children.  This attribute can only be enabled on
> empty directories for filesystems that support the encoding feature,
> thus preventing collision of file names that only differ by case.
> 
> * dcache handling:
> 
> For a +F directory, F2Fs only stores the first equivalent name dentry
> used in the dcache. This is done to prevent unintentional duplication of
> dentries in the dcache, while also allowing the VFS code to quickly find
> the right entry in the cache despite which equivalent string was used in
> a previous lookup, without having to resort to ->lookup().
> 
> d_hash() of casefolded directories is implemented as the hash of the
> casefolded string, such that we always have a well-known bucket for all
> the equivalencies of the same string. d_compare() uses the
> utf8_strncasecmp() infrastructure, which handles the comparison of
> equivalent, same case, names as well.
> 
> For now, negative lookups are not inserted in the dcache, since they
> would need to be invalidated anyway, because we can't trust missing file
> dentries.  This is bad for performance but requires some leveraging of
> the vfs layer to fix.  We can live without that for now, and so does
> everyone else.
> 
> * on-disk data:
> 
> Despite using a specific version of the name as the internal
> representation within the dcache, the name stored and fetched from the
> disk is a byte-per-byte match with what the user requested, making this
> implementation 'name-preserving'. i.e. no actual information is lost
> when writing to storage.
> 
> DX is supported by modifying the hashes used in +F directories to make
> them case/encoding-aware.  The new disk hashes are calculated as the
> hash of the full casefolded string, instead of the string directly.
> This allows us to efficiently search for file names in the htree without
> requiring the user to provide an exact name.
> 
> * Dealing with invalid sequences:
> 
> By default, when a invalid UTF-8 sequence is identified, ext4 will treat
> it as an opaque byte sequence, ignoring the encoding and reverting to
> the old behavior for that unique file.  This means that case-insensitive
> file name lookup will not work only for that file.  An optional bit can
> be set in the superblock telling the filesystem code and userspace tools
> to enforce the encoding.  When that optional bit is set, any attempt to
> create a file name using an invalid UTF-8 sequence will fail and return
> an error to userspace.
> 
> * Normalization algorithm:
> 
> The UTF-8 algorithms used to compare strings in f2fs is implemented
> in fs/unicode, and is based on a previous version developed by
> SGI.  It implements the Canonical decomposition (NFD) algorithm
> described by the Unicode specification 12.1, or higher, combined with
> the elimination of ignorable code points (NFDi) and full
> case-folding (CF) as documented in fs/unicode/utf8_norm.c.
> 
> NFD seems to be the best normalization method for F2FS because:
> 
>   - It has a lower cost than NFC/NFKC (which requires
>     decomposing to NFD as an intermediary step)
>   - It doesn't eliminate important semantic meaning like
>     compatibility decompositions.
> 
> Although:
> 
> - This implementation is not completely linguistic accurate, because
> different languages have conflicting rules, which would require the
> specialization of the filesystem to a given locale, which brings all
> sorts of problems for removable media and for users who use more than
> one language.
> """
> 
> Signed-off-by: Daniel Rosenberg <drosen@xxxxxxxxxx>
> ---
>  fs/f2fs/dir.c    | 126 +++++++++++++++++++++++++++++++++++++++++++----
>  fs/f2fs/f2fs.h   |  15 ++++--
>  fs/f2fs/file.c   |   9 ++++
>  fs/f2fs/hash.c   |  35 ++++++++++++-
>  fs/f2fs/inline.c |   4 +-
>  fs/f2fs/inode.c  |   4 +-
>  fs/f2fs/namei.c  |  21 ++++++++
>  fs/f2fs/super.c  |   6 +++
>  8 files changed, 203 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 85a1528f319f2..2913483473f30 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -8,6 +8,7 @@
>  #include <linux/fs.h>
>  #include <linux/f2fs_fs.h>
>  #include <linux/sched/signal.h>
> +#include <linux/unicode.h>
>  #include "f2fs.h"
>  #include "node.h"
>  #include "acl.h"
> @@ -81,7 +82,8 @@ static unsigned long dir_block_index(unsigned int level,
>  	return bidx;
>  }
>  
> -static struct f2fs_dir_entry *find_in_block(struct page *dentry_page,
> +static struct f2fs_dir_entry *find_in_block(struct inode *dir,
> +				struct page *dentry_page,
>  				struct fscrypt_name *fname,
>  				f2fs_hash_t namehash,
>  				int *max_slots,
> @@ -93,7 +95,7 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page,
>  
>  	dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page);
>  
> -	make_dentry_ptr_block(NULL, &d, dentry_blk);
> +	make_dentry_ptr_block(dir, &d, dentry_blk);
>  	de = f2fs_find_target_dentry(fname, namehash, max_slots, &d);
>  	if (de)
>  		*res_page = dentry_page;
> @@ -101,6 +103,39 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page,
>  	return de;
>  }
>  
> +#ifdef CONFIG_UNICODE
> +/*
> + * Test whether a case-insensitive directory entry matches the filename
> + * being searched for.
> + *
> + * Returns: 0 if the directory entry matches, more than 0 if it
> + * doesn't match or less than zero on error.
> + */
> +int f2fs_ci_compare(const struct inode *parent, const struct qstr *name,
> +		    const struct qstr *entry)
> +{
> +	const struct f2fs_sb_info *sbi = F2FS_SB(parent->i_sb);
> +	const struct unicode_map *um = sbi->s_encoding;
> +	int ret;
> +
> +	ret = utf8_strncasecmp(um, name, entry);
> +	if (ret < 0) {
> +		/* Handle invalid character sequence as either an error
> +		 * or as an opaque byte sequence.
> +		 */
> +		if (f2fs_has_strict_mode(sbi))
> +			return -EINVAL;
> +
> +		if (name->len != entry->len)
> +			return 1;
> +
> +		return !!memcmp(name->name, entry->name, name->len);
> +	}
> +
> +	return ret;
> +}
> +#endif
> +
>  struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname,
>  			f2fs_hash_t namehash, int *max_slots,
>  			struct f2fs_dentry_ptr *d)
> @@ -108,6 +143,9 @@ struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname,
>  	struct f2fs_dir_entry *de;
>  	unsigned long bit_pos = 0;
>  	int max_len = 0;
> +#ifdef CONFIG_UNICODE
> +	struct qstr entry;
> +#endif
>  
>  	if (max_slots)
>  		*max_slots = 0;
> @@ -119,16 +157,28 @@ struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname,
>  		}
>  
>  		de = &d->dentry[bit_pos];
> +#ifdef CONFIG_UNICODE
> +		entry.name = d->filename[bit_pos];
> +		entry.len = de->name_len;
> +#endif
>  
>  		if (unlikely(!de->name_len)) {
>  			bit_pos++;
>  			continue;
>  		}
> +		if (de->hash_code == namehash) {
> +#ifdef CONFIG_UNICODE
> +			if (F2FS_SB(d->inode->i_sb)->s_encoding &&
> +					IS_CASEFOLDED(d->inode) &&
> +					!f2fs_ci_compare(d->inode,
> +						fname->usr_fname, &entry))
> +				goto found;
>  
> -		if (de->hash_code == namehash &&
> -		    fscrypt_match_name(fname, d->filename[bit_pos],
> -				       le16_to_cpu(de->name_len)))
> -			goto found;
> +#endif
> +			if (fscrypt_match_name(fname, d->filename[bit_pos],
> +						le16_to_cpu(de->name_len)))
> +				goto found;
> +		}
>  
>  		if (max_slots && max_len > *max_slots)
>  			*max_slots = max_len;
> @@ -157,7 +207,7 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
>  	struct f2fs_dir_entry *de = NULL;
>  	bool room = false;
>  	int max_slots;
> -	f2fs_hash_t namehash = f2fs_dentry_hash(&name, fname);
> +	f2fs_hash_t namehash = f2fs_dentry_hash(dir, &name, fname);
>  
>  	nbucket = dir_buckets(level, F2FS_I(dir)->i_dir_level);
>  	nblock = bucket_blocks(level);
> @@ -179,8 +229,8 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir,
>  			}
>  		}
>  
> -		de = find_in_block(dentry_page, fname, namehash, &max_slots,
> -								res_page);
> +		de = find_in_block(dir, dentry_page, fname, namehash,
> +							&max_slots, res_page);
>  		if (de)
>  			break;
>  
> @@ -250,6 +300,14 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir,
>  	struct fscrypt_name fname;
>  	int err;
>  
> +#ifdef CONFIG_UNICODE
> +	if (f2fs_has_strict_mode(F2FS_I_SB(dir)) && IS_CASEFOLDED(dir) &&
> +			utf8_validate(F2FS_I_SB(dir)->s_encoding, child)) {
> +		*res_page = ERR_PTR(-EINVAL);
> +		return NULL;
> +	}
> +#endif
> +
>  	err = fscrypt_setup_filename(dir, child, 1, &fname);
>  	if (err) {
>  		if (err == -ENOENT)
> @@ -504,7 +562,7 @@ int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name,
>  
>  	level = 0;
>  	slots = GET_DENTRY_SLOTS(new_name->len);
> -	dentry_hash = f2fs_dentry_hash(new_name, NULL);
> +	dentry_hash = f2fs_dentry_hash(dir, new_name, NULL);
>  
>  	current_depth = F2FS_I(dir)->i_current_depth;
>  	if (F2FS_I(dir)->chash == dentry_hash) {
> @@ -943,3 +1001,51 @@ const struct file_operations f2fs_dir_operations = {
>  	.compat_ioctl   = f2fs_compat_ioctl,
>  #endif
>  };
> +
> +#ifdef CONFIG_UNICODE
> +static int f2fs_d_compare(const struct dentry *dentry, unsigned int len,
> +			  const char *str, const struct qstr *name)
> +{
> +	struct qstr qstr = {.name = str, .len = len };
> +
> +	if (!IS_CASEFOLDED(dentry->d_parent->d_inode)) {
> +		if (len != name->len)
> +			return -1;
> +		return memcmp(str, name, len);
> +	}
> +
> +	return f2fs_ci_compare(dentry->d_parent->d_inode, name, &qstr);
> +}
> +
> +static int f2fs_d_hash(const struct dentry *dentry, struct qstr *str)
> +{
> +	struct f2fs_sb_info *sbi = F2FS_SB(dentry->d_sb);
> +	const struct unicode_map *um = sbi->s_encoding;
> +	unsigned char *norm;
> +	int len, ret = 0;
> +
> +	if (!IS_CASEFOLDED(dentry->d_inode))
> +		return 0;
> +
> +	norm = f2fs_kmalloc(sbi, PATH_MAX, GFP_ATOMIC);
> +	if (!norm)
> +		return -ENOMEM;
> +
> +	len = utf8_casefold(um, str, norm, PATH_MAX);
> +	if (len < 0) {
> +		if (f2fs_has_strict_mode(sbi))
> +			ret = -EINVAL;
> +		goto out;
> +	}
> +	str->hash = full_name_hash(dentry, norm, len);
> +out:
> +	kvfree(norm);
> +	return ret;
> +}
> +
> +const struct dentry_operations f2fs_dentry_ops = {
> +	.d_hash = f2fs_d_hash,
> +	.d_compare = f2fs_d_compare,
> +};
> +#endif
> +
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index c6c7904572d0d..31fd2a268ba14 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2364,10 +2364,12 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
>  #define F2FS_INDEX_FL			0x00001000 /* hash-indexed directory */
>  #define F2FS_DIRSYNC_FL			0x00010000 /* dirsync behaviour (directories only) */
>  #define F2FS_PROJINHERIT_FL		0x20000000 /* Create with parents projid */
> +#define F2FS_CASEFOLD_FL		0x40000000 /* Casefolded file */
>  
>  /* Flags that should be inherited by new inodes from their parent. */
>  #define F2FS_FL_INHERITED (F2FS_SYNC_FL | F2FS_NODUMP_FL | F2FS_NOATIME_FL | \
> -			   F2FS_DIRSYNC_FL | F2FS_PROJINHERIT_FL)
> +			   F2FS_DIRSYNC_FL | F2FS_PROJINHERIT_FL | \
> +			   F2FS_CASEFOLD_FL)
>  
>  /* Flags that are appropriate for regular files (all but dir-specific ones). */
>  #define F2FS_REG_FLMASK		(~(F2FS_DIRSYNC_FL | F2FS_PROJINHERIT_FL))
> @@ -2930,6 +2932,10 @@ int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
>  							bool hot, bool set);
>  struct dentry *f2fs_get_parent(struct dentry *child);
>  
> +extern int f2fs_ci_compare(const struct inode *parent,
> +			   const struct qstr *name,
> +			   const struct qstr *entry);
> +
>  /*
>   * dir.c
>   */
> @@ -2993,8 +2999,8 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi);
>  /*
>   * hash.c
>   */
> -f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info,
> -				struct fscrypt_name *fname);
> +f2fs_hash_t f2fs_dentry_hash(const struct inode *dir,
> +		const struct qstr *name_info, struct fscrypt_name *fname);
>  
>  /*
>   * node.c
> @@ -3437,6 +3443,9 @@ static inline void f2fs_destroy_root_stats(void) { }
>  #endif
>  
>  extern const struct file_operations f2fs_dir_operations;
> +#ifdef CONFIG_UNICODE
> +extern const struct dentry_operations f2fs_dentry_ops;
> +#endif
>  extern const struct file_operations f2fs_file_operations;
>  extern const struct inode_operations f2fs_file_inode_operations;
>  extern const struct address_space_operations f2fs_dblock_aops;
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f8d46df8fa9ee..7adef2d8dbc47 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1660,7 +1660,16 @@ static int f2fs_setflags_common(struct inode *inode, u32 iflags, u32 mask)
>  		return -EPERM;
>  
>  	oldflags = fi->i_flags;
> +	if ((iflags ^ oldflags) & F2FS_CASEFOLD_FL) {
> +		if (!f2fs_sb_has_casefold(F2FS_I_SB(inode)))
> +			return -EOPNOTSUPP;
> +
> +		if (!S_ISDIR(inode->i_mode))
> +			return -ENOTDIR;
>  
> +		if (!f2fs_empty_dir(inode))
> +			return -ENOTEMPTY;
> +	}
>  	if ((iflags ^ oldflags) & (F2FS_APPEND_FL | F2FS_IMMUTABLE_FL))
>  		if (!capable(CAP_LINUX_IMMUTABLE))
>  			return -EPERM;
> diff --git a/fs/f2fs/hash.c b/fs/f2fs/hash.c
> index cc82f142f811f..b7bd0ddbbdf01 100644
> --- a/fs/f2fs/hash.c
> +++ b/fs/f2fs/hash.c
> @@ -14,6 +14,7 @@
>  #include <linux/f2fs_fs.h>
>  #include <linux/cryptohash.h>
>  #include <linux/pagemap.h>
> +#include <linux/unicode.h>
>  
>  #include "f2fs.h"
>  
> @@ -67,7 +68,7 @@ static void str2hashbuf(const unsigned char *msg, size_t len,
>  		*buf++ = pad;
>  }
>  
> -f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info,
> +static f2fs_hash_t __f2fs_dentry_hash(const struct qstr *name_info,
>  				struct fscrypt_name *fname)
>  {
>  	__u32 hash;
> @@ -103,3 +104,35 @@ f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info,
>  	f2fs_hash = cpu_to_le32(hash & ~F2FS_HASH_COL_BIT);
>  	return f2fs_hash;
>  }
> +
> +f2fs_hash_t f2fs_dentry_hash(const struct inode *dir,
> +		const struct qstr *name_info, struct fscrypt_name *fname)
> +{
> +#ifdef CONFIG_UNICODE
> +	struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
> +	const struct unicode_map *um = sbi->s_encoding;
> +	int r, dlen;
> +	unsigned char *buff;
> +	struct qstr *folded;
> +
> +	if (name_info->len && IS_CASEFOLDED(dir)) {
> +		buff = f2fs_kzalloc(sbi, sizeof(char) * PATH_MAX, GFP_KERNEL);
> +		if (!buff)
> +			return -ENOMEM;
> +
> +		dlen = utf8_casefold(um, name_info, buff, PATH_MAX);
> +		if (dlen < 0) {
> +			kfree(buff);

kvfree()

> +			goto opaque_seq;
> +		}
> +		folded->name = buff;
> +		folded->len = dlen;
> +		r = __f2fs_dentry_hash(folded, fname);
> +
> +		kvfree(buff);
> +		return r;
> +	}
> +opaque_seq:
> +#endif
> +	return __f2fs_dentry_hash(name_info, fname);
> +}
> diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
> index 3613efca8c00c..354f71cf9e6ba 100644
> --- a/fs/f2fs/inline.c
> +++ b/fs/f2fs/inline.c
> @@ -320,7 +320,7 @@ struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir,
>  		return NULL;
>  	}
>  
> -	namehash = f2fs_dentry_hash(&name, fname);
> +	namehash = f2fs_dentry_hash(dir, &name, fname);
>  
>  	inline_dentry = inline_data_addr(dir, ipage);
>  
> @@ -580,7 +580,7 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name,
>  
>  	f2fs_wait_on_page_writeback(ipage, NODE, true, true);
>  
> -	name_hash = f2fs_dentry_hash(new_name, NULL);
> +	name_hash = f2fs_dentry_hash(dir, new_name, NULL);
>  	f2fs_update_dentry(ino, mode, &d, new_name, name_hash, bit_pos);
>  
>  	set_page_dirty(ipage);
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index a33d7a849b2df..9a1f0d6616577 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -46,9 +46,11 @@ void f2fs_set_inode_flags(struct inode *inode)
>  		new_fl |= S_DIRSYNC;
>  	if (file_is_encrypt(inode))
>  		new_fl |= S_ENCRYPTED;
> +	if (flags & F2FS_CASEFOLD_FL)
> +		new_fl |= S_CASEFOLD;
>  	inode_set_flags(inode, new_fl,
>  			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
> -			S_ENCRYPTED);
> +			S_ENCRYPTED|S_CASEFOLD);
>  }
>  
>  static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri)
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index c5b99042e6f2b..727de2f8620f2 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -489,6 +489,17 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
>  		goto out_iput;
>  	}
>  out_splice:
> +#ifdef CONFIG_UNICODE
> +	if (!inode && IS_CASEFOLDED(dir)) {
> +		/* Eventually we want to call d_add_ci(dentry, NULL)
> +		 * for negative dentries in the encoding case as
> +		 * well.  For now, prevent the negative dentry
> +		 * from being cached.
> +		 */
> +		trace_f2fs_lookup_end(dir, dentry, ino, err);
> +		return NULL;
> +	}
> +#endif
>  	new = d_splice_alias(inode, dentry);
>  	err = PTR_ERR_OR_ZERO(new);
>  	trace_f2fs_lookup_end(dir, dentry, ino, err);
> @@ -537,6 +548,16 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry)
>  		goto fail;
>  	}
>  	f2fs_delete_entry(de, page, dir, inode);
> +#ifdef CONFIG_UNICODE
> +	/* VFS negative dentries are incompatible with Encoding and
> +	 * Case-insensitiveness. Eventually we'll want avoid
> +	 * invalidating the dentries here, alongside with returning the
> +	 * negative dentries at f2fs_lookup(), when it is  better
> +	 * supported by the VFS for the CI case.
> +	 */
> +	if (IS_CASEFOLDED(dir))
> +		d_invalidate(dentry);
> +#endif
>  	f2fs_unlock_op(sbi);
>  
>  	if (IS_DIRSYNC(dir))
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 82f7da93c3ed1..9c522d1abcb6d 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -3115,6 +3115,7 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi)
>  		return -EINVAL;
>  	}
>  #endif
> +	return 0;

It needs to relocate this line to PATCH 1/2

>  }
>  
>  static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
> @@ -3410,6 +3411,11 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
>  		goto free_node_inode;
>  	}
>  
> +#ifdef CONFIG_UNICODE
> +	if (sbi->s_encoding)
> +		sb->s_d_op = &f2fs_dentry_ops;
> +#endif

How about moving this to f2fs_setup_casefold()?

Thanks,

> +
>  	sb->s_root = d_make_root(root); /* allocate root dentry */
>  	if (!sb->s_root) {
>  		err = -ENOMEM;
>