Re: [PATCH v1] vfs: kill FS_REVAL_DOT by adding a d_reval_jumped dentry op

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 Feb 2013 09:32:25 +1100
NeilBrown <neilb@xxxxxxx> wrote:

> On Wed, 20 Feb 2013 11:19:05 -0500 Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> 
> > The following set of operations on a NFS client and server will cause
> > 
> >     server# mkdir a
> >     client# cd a
> >     server# mv a a.bak
> >     client# sleep 30  # (or whatever the dir attrcache timeout is)
> >     client# stat .
> >     stat: cannot stat ‘.’: Stale NFS file handle
> > 
> > Obviously, we should not be getting an ESTALE error back there since the
> > inode still exists on the server. The problem is that the lookup code
> > will call d_revalidate on the dentry that "." refers to, because NFS has
> > FS_REVAL_DOT set.
> > 
> > nfs_lookup_revalidate will see that the parent directory has changed and
> > will try to reverify the dentry by redoing a LOOKUP. That of course
> > fails, so the lookup code returns ESTALE.
> > 
> > The problem here is that d_revalidate is really a bad fit for this case.
> > What we really want to know at this point is whether the inode is still
> > good or not, but we don't really care what name it goes by or whether
> > the dcache is still valid.
> > 
> > Add a new d_op->d_reval_jumped operation and have complete_walk call
> > that instead of d_revalidate. The intent there is to allow for a
> > "weaker" d_revalidate that just checks to see whether the inode is still
> > good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
> > special casing.
> > 
> > In a perfect world, this would be a new inode operation instead, but
> > I don't see a way to cleanly handle that for 9p, which needs a
> > dentry in order to get a fid.
> 
> The earlier i_op->revalidate inode operation took a 'dentry', not an inode.
> If you look at struct inode_operations, you will see that 8 of them take a
> dentry as their first argument.
> 
> Never the less, I would leave it in dentry_operations.  It makes it easier to
> use the DCACHE_OP_ optimisation.
> 

Good point. I guess my thinking was that we aren't really interested in
the dentry, per-se. But for some filesystems, having the dentry may
make this easier to deal with.

> 
> > 
> > Cc: NeilBrown <neilb@xxxxxxx>
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  Documentation/filesystems/Locking |  2 ++
> >  Documentation/filesystems/vfs.txt | 32 ++++++++++++++++++++++++++--
> >  fs/9p/vfs_dentry.c                |  1 +
> >  fs/9p/vfs_super.c                 |  2 +-
> >  fs/dcache.c                       |  3 +++
> >  fs/namei.c                        |  8 ++-----
> >  fs/nfs/dir.c                      | 45 +++++++++++++++++++++++++++++++++++++++
> >  fs/nfs/nfs4super.c                |  6 +++---
> >  fs/nfs/super.c                    |  6 +++---
> >  include/linux/dcache.h            |  3 +++
> >  include/linux/fs.h                |  1 -
> >  11 files changed, 93 insertions(+), 16 deletions(-)
> > 
> > diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
> > index f48e0c6..9718b667 100644
> > --- a/Documentation/filesystems/Locking
> > +++ b/Documentation/filesystems/Locking
> > @@ -10,6 +10,7 @@ be able to use diff(1).
> >  --------------------------- dentry_operations --------------------------
> >  prototypes:
> >  	int (*d_revalidate)(struct dentry *, unsigned int);
> > +	int (*d_reval_jumped)(struct dentry *, unsigned int);
> >  	int (*d_hash)(const struct dentry *, const struct inode *,
> >  			struct qstr *);
> >  	int (*d_compare)(const struct dentry *, const struct inode *,
> 
> I cannot get excited about the name "d_reval_jumped" .... though once you
> read the explanation in the doco (thanks for that) it makes sense.  I guess
> I'll get used to it.
> 

Me neither. I think Al mentioned that he's renamed this to
"d_weak_revalidate" in his tree. Neither name really does it for me,
so I'm open to suggestions.

> >  /*
> > + * A weaker form of d_revalidate for revalidating just the dentry->d_inode
> > + * when we don't really care about the dentry name. This is called when a
> > + * pathwalk ends on a dentry that was not found via a normal lookup in the
> > + * parent dir (e.g.: ".", "..", procfs symlinks or mountpoint traversals).
> > + *
> > + * In this situation, we just want to verify that the inode itself is OK
> > + * since the dentry might have changed on the server.
> > + */
> > +static int nfs_reval_jumped(struct dentry *dentry, unsigned int flags)
> > +{
> > +	int error;
> > +	struct inode *inode = dentry->d_inode;
> > +
> > +	if (flags & LOOKUP_RCU)
> > +		return -ECHILD;
> > +
> > +	/*
> > +	 * I believe we can only get a negative dentry here in the case of a
> > +	 * procfs-style symlink. Just assume it's correct for now, but we may
> > +	 * eventually need to do something more here.
> > +	 */
> > +	if (!inode) {
> > +		dfprintk(LOOKUPCACHE, "%s: %s/%s has negative inode\n",
> > +				__func__, dentry->d_parent->d_name.name,
> > +				dentry->d_name.name);
> > +		return 1;
> > +	}
> > +
> > +	if (is_bad_inode(inode)) {
> > +		dfprintk(LOOKUPCACHE, "%s: %s/%s has dud inode\n",
> > +				__func__, dentry->d_parent->d_name.name,
> > +				dentry->d_name.name);
> > +		return 0;
> > +	}
> > +
> > +	error = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> > +	dfprintk(LOOKUPCACHE, "NFS: %s: inode %lu is %s\n",
> > +			__func__, inode->i_ino, error ? "invalid" : "valid");
> > +	if (error)
> > +		return 0;
> > +	return 1;
> > +}
> 
> I wonder if we can delay the "-ECHILD" return a bit.
> Leaving it to after the first two tests should be safe, but doesn't gain us
> anything.
> 
> Open-coding the nfs_revalidate_inode as:
> 	if (!(NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATTR)
> 			&& !nfs_attribute_cache_expired(inode))
> 		return NFS_STALE(inode) ? 0 : 1;
> 	error = __nfs_revalidate_inode(server, inode);
> 
> and then inserting the -ECHILD code in before the __nfs_revalidate_inode
> should be safe, and means we still benefit from the RCU path in the common
> case.
> Of course, for that to be really useful, nfs_lookup_revalidate would need to
> be changed to only return -ECHILD if it really needed to block, and  maybe
> that is too hard, or at least is a job for another day.
> 
> Otherwise, looks good - thanks.
> 
> Reviewed-by: NeilBrown <neilb@xxxxxxx>
> 
> 

I don't know that much about rcuwalk mode, but the vfs.txt doc says
this:

        If in rcu-walk mode, the filesystem must revalidate the dentry
        without blocking or storing to the dentry, d_parent and d_inode
        should not be used without care (because they can change and,
        in d_inode case, even become NULL under us).

If we assume that d_inode does become NULL after we set the "inode"
pointer, do we still hold a reference to it? Or do we need to ensure
that we take one when we set that pointer?

Also, since this is the last component of the path, I suspect that
we're almost never going to be in rcu-walk mode here, right?

In any case, I think we ought to do that sort of optimization
separately on top of this patch. We probably ought to consider similar
optimization in the d_revalidate routines too. I think we might get
even more gain there anyway.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux