Re: [PATCH 1/2] fs: add a DCACHE_NEED_LOOKUP flag for d_flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On May 19, 2011, at 11:58, Josef Bacik wrote:
> Btrfs (and I'd venture most other fs's) stores its indexes in nice disk order
> for readdir, but unfortunately in the case of anything that stats the files in
> order that readdir spits back (like oh say ls) that means we still have to do
> the normal lookup of the file, which means looking up our other index and then
> looking up the inode.  What I want is a way to create dummy dentries when we
> find them in readdir so that when ls or anything else subsequently does a
> stat(), we already have the location information in the dentry and can go
> straight to the inode itself.  The lookup stuff just assumes that if it finds a
> dentry it is done, it doesn't perform a lookup.  So add a DCACHE_NEED_LOOKUP
> flag so that the lookup code knows it still needs to run i_op->lookup() on the
> parent to get the inode for the dentry.  I have tested this with btrfs and I
> went from something that looks like this
> 
> http://people.redhat.com/jwhiter/ls-noreada.png
> 
> To this
> 
> http://people.redhat.com/jwhiter/ls-good.png
> 
> Thats a savings of 1300 seconds, or 22 minutes.  That is a significant savings.
> Thanks,

This comment should probably mention the number of files being tested, in order
to make a 1300s savings meaningful.  Similarly, it would be better to provide
the absolute times of tests in case these URLs disappear in the future.

"That reduces the time to do "ls -l" on a 1M file directory from 2181s to 855s."

> Signed-off-by: Josef Bacik <josef@xxxxxxxxxx>
> ---
> fs/namei.c             |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/dcache.h |    1 +
> 2 files changed, 49 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index e3c4f11..a1bff4f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1198,6 +1198,29 @@ static struct dentry *d_alloc_and_lookup(struct dentry *parent,
> }
> 
> /*
> + * We already have a dentry, but require a lookup to be performed on the parent
> + * directory to fill in d_inode. Returns the new dentry, or ERR_PTR on error.
> + * parent->d_inode->i_mutex must be held. d_lookup must have verified that no
> + * child exists while under i_mutex.
> + */
> +static struct dentry *d_inode_lookup(struct dentry *parent, struct dentry *dentry,
> +				     struct nameidata *nd)
> +{
> +	struct inode *inode = parent->d_inode;
> +	struct dentry *old;
> +
> +	/* Don't create child dentry for a dead directory. */
> +	if (unlikely(IS_DEADDIR(inode)))
> +		return ERR_PTR(-ENOENT);
> +
> +	old = inode->i_op->lookup(inode, dentry, nd);
> +	if (unlikely(old)) {
> +		dput(dentry);
> +		dentry = old;
> +	}
> +	return dentry;
> +}
> +/*
> *  It's more convoluted than I'd like it to be, but... it's still fairly
> *  small and for now I'd prefer to have fast path as straight as possible.
> *  It _is_ time-critical.
> @@ -1236,6 +1259,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> 				goto unlazy;
> 			}
> 		}
> +		if (unlikely(dentry->d_flags & DCACHE_NEED_LOOKUP)) {
> +			if (nameidata_dentry_drop_rcu(nd, dentry))
> +				return -ECHILD;
> +			dput(dentry);
> +			dentry = NULL;
> +			goto retry;
> +		}
> 		path->mnt = mnt;
> 		path->dentry = dentry;
> 		if (likely(__follow_mount_rcu(nd, path, inode, false)))
> @@ -1250,6 +1280,12 @@ unlazy:
> 		}
> 	} else {
> 		dentry = __d_lookup(parent, name);
> +		if (unlikely(!dentry))
> +			goto retry;
> +		if (unlikely(dentry->d_flags & DCACHE_NEED_LOOKUP)) {
> +			dput(dentry);
> +			dentry = NULL;
> +		}
> 	}
> 
> retry:
> @@ -1268,6 +1304,18 @@ retry:
> 			/* known good */
> 			need_reval = 0;
> 			status = 1;
> +		} else if (unlikely(dentry->d_flags & DCACHE_NEED_LOOKUP)) {
> +			struct dentry *old;
> +
> +			dentry->d_flags &= ~DCACHE_NEED_LOOKUP;
> +			dentry = d_inode_lookup(parent, dentry, nd);

Would it make sense to keep DCACHE_NEED_LOOKUP set in d_flags until _after_
the call to d_inode_lookup()?  That way the filesystem can positively know
it is doing the inode lookup from d_fsdata, instead of just inferring it
from the presence of d_fsdata?  It is already the filesystem that is setting
DCACHE_NEED_LOOKUP, so it should really be the one clearing this flag also.

I'm concerned that there may be filesystems that need d_fsdata for something
already, so the presence/absence of d_fsdata is not a clear indication to
the underlying filesystem of whether to do an inode lookup based on d_fsdata,
which might mean that it needs to keep yet another flag for this purpose.

> +			if (IS_ERR(dentry)) {
> +				mutex_unlock(&dir->i_mutex);
> +				return PTR_ERR(dentry);
> +			}
> +			/* known good */
> +			need_reval = 0;
> +			status = 1;
> 		}
> 		mutex_unlock(&dir->i_mutex);
> 	}
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 19d90a5..a8b2457 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -216,6 +216,7 @@ struct dentry_operations {
> #define DCACHE_MOUNTED		0x10000	/* is a mountpoint */
> #define DCACHE_NEED_AUTOMOUNT	0x20000	/* handle automount on this dir */
> #define DCACHE_MANAGE_TRANSIT	0x40000	/* manage transit from this dirent */
> +#define DCACHE_NEED_LOOKUP	0x80000 /* dentry requires i_op->lookup */
> #define DCACHE_MANAGED_DENTRY \
> 	(DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)
> 
> -- 
> 1.7.2.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux