Re: [PATCH v3] fs: try an opportunistic lookup for O_CREAT opens too

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 07-08-24 08:10:27, Jeff Layton wrote:
> Today, when opening a file we'll typically do a fast lookup, but if
> O_CREAT is set, the kernel always takes the exclusive inode lock. I
> assume this was done with the expectation that O_CREAT means that we
> always expect to do the create, but that's often not the case. Many
> programs set O_CREAT even in scenarios where the file already exists.
> 
> This patch rearranges the pathwalk-for-open code to also attempt a
> fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
> inode_lock can be avoided altogether, and if auditing isn't enabled, it
> can stay in rcuwalk mode for the last step_into.
> 
> One notable exception that is hopefully temporary: if we're doing an
> rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
> dentry in that case is more expensive than taking the i_rwsem for now.
> 
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>

I'm not very familiar with the path lookup code but the patch looks correct
to me and the win is nice. Feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

								Honza

> ---
> Here's a revised patch that does a fast_lookup in the O_CREAT codepath
> too. The main difference here is that if a positive dentry is found and
> audit_dummy_context is true, then we keep the walk lazy for the last
> component, which avoids having to take any locks on the parent (just
> like with non-O_CREAT opens).
> 
> Mateusz wrote a will-it-scale test that does an O_CREAT open and close in
> the same directory repeatedly. Running that in 70 different processes:
> 
>     v6.10:		  754565
>     v6.10+patch:	25747851
> 
> ...which is roughly a 34x speedup. I also ran the unlink1 test in single
> process mode to try and gauge how bad the performance impact would be in
> the case where we always have to search, not find anything and do the
> create:
> 
>     v6.10:		200106
>     v6.10+patch:	199188
> 
> ~0.4% performance hit in that test. I'm not sure that's statistically
> significant, but we should keep an eye out for slowdowns in these sorts
> of workloads if we decide to take this.
> ---
> Changes in v3:
> - Check for IS_ERR in lookup_fast result
> - Future-proof open_last_lookups to handle case where lookup_fast_for_open
>   returns a positive dentry while auditing is enabled
> - Link to v2: https://lore.kernel.org/r/20240806-openfast-v2-1-42da45981811@xxxxxxxxxx
> 
> Changes in v2:
> - drop the lockref patch since Mateusz is working on a better approach
> - add trailing_slashes helper function
> - add a lookup_fast_for_open helper function
> - make lookup_fast_for_open skip the lookup if auditing is enabled
> - if we find a positive dentry and auditing is disabled, don't unlazy
> - Link to v1: https://lore.kernel.org/r/20240802-openfast-v1-0-a1cff2a33063@xxxxxxxxxx
> ---
>  fs/namei.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 64 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 1e05a0f3f04d..7894fafa8e71 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3518,6 +3518,49 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
>  	return ERR_PTR(error);
>  }
>  
> +static inline bool trailing_slashes(struct nameidata *nd)
> +{
> +	return (bool)nd->last.name[nd->last.len];
> +}
> +
> +static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag)
> +{
> +	struct dentry *dentry;
> +
> +	if (open_flag & O_CREAT) {
> +		/* Don't bother on an O_EXCL create */
> +		if (open_flag & O_EXCL)
> +			return NULL;
> +
> +		/*
> +		 * FIXME: If auditing is enabled, then we'll have to unlazy to
> +		 * use the dentry. For now, don't do this, since it shifts
> +		 * contention from parent's i_rwsem to its d_lockref spinlock.
> +		 * Reconsider this once dentry refcounting handles heavy
> +		 * contention better.
> +		 */
> +		if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context())
> +			return NULL;
> +	}
> +
> +	if (trailing_slashes(nd))
> +		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
> +
> +	dentry = lookup_fast(nd);
> +	if (IS_ERR_OR_NULL(dentry))
> +		return dentry;
> +
> +	if (open_flag & O_CREAT) {
> +		/* Discard negative dentries. Need inode_lock to do the create */
> +		if (!dentry->d_inode) {
> +			if (!(nd->flags & LOOKUP_RCU))
> +				dput(dentry);
> +			dentry = NULL;
> +		}
> +	}
> +	return dentry;
> +}
> +
>  static const char *open_last_lookups(struct nameidata *nd,
>  		   struct file *file, const struct open_flags *op)
>  {
> @@ -3535,28 +3578,39 @@ static const char *open_last_lookups(struct nameidata *nd,
>  		return handle_dots(nd, nd->last_type);
>  	}
>  
> +	/* We _can_ be in RCU mode here */
> +	dentry = lookup_fast_for_open(nd, open_flag);
> +	if (IS_ERR(dentry))
> +		return ERR_CAST(dentry);
> +
>  	if (!(open_flag & O_CREAT)) {
> -		if (nd->last.name[nd->last.len])
> -			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
> -		/* we _can_ be in RCU mode here */
> -		dentry = lookup_fast(nd);
> -		if (IS_ERR(dentry))
> -			return ERR_CAST(dentry);
>  		if (likely(dentry))
>  			goto finish_lookup;
>  
>  		if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
>  			return ERR_PTR(-ECHILD);
>  	} else {
> -		/* create side of things */
>  		if (nd->flags & LOOKUP_RCU) {
> -			if (!try_to_unlazy(nd))
> +			bool unlazied;
> +
> +			/* can stay in rcuwalk if not auditing */
> +			if (dentry && audit_dummy_context()) {
> +				if (trailing_slashes(nd))
> +					return ERR_PTR(-EISDIR);
> +				goto finish_lookup;
> +			}
> +			unlazied = dentry ? try_to_unlazy_next(nd, dentry) :
> +					    try_to_unlazy(nd);
> +			if (!unlazied)
>  				return ERR_PTR(-ECHILD);
>  		}
>  		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> -		/* trailing slashes? */
> -		if (unlikely(nd->last.name[nd->last.len]))
> +		if (trailing_slashes(nd)) {
> +			dput(dentry);
>  			return ERR_PTR(-EISDIR);
> +		}
> +		if (dentry)
> +			goto finish_lookup;
>  	}
>  
>  	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
> 
> ---
> base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
> change-id: 20240723-openfast-ac49a7b6ade2
> 
> Best regards,
> -- 
> Jeff Layton <jlayton@xxxxxxxxxx>
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux