Re: [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Fri, 8 May 2015 12:15:58 -0400

On Wed, May 06, 2015 at 08:26:28AM +1000, NeilBrown wrote:
> On Tue, 5 May 2015 11:52:03 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 4a8d998b7274..0f554bf41dea 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2139,6 +2139,8 @@ static struct dentry *lookup_hash(struct nameidata *nd)
> >   *
> >   * Note that this routine is purely a helper for filesystem usage and should
> >   * not be called by generic code.
> > + *
> > + * Must be called with the parented i_mutex held.
> 
> "parented"?
> 
>     * base->i_mutex must be held by caller.
> 
> ??

Thanks.

> > +struct dentry *lookup_one_len_unlocked(const char *name,
...
> > +	ret = __d_lookup(base, &this);
> > +	if (ret)
> > +		return ret;
> 
> __d_lookup() is a little too low level, as it doesn't call d_revalidate() and
> it doesn't retry if the rename_lock seqlock says it should.
> 
> The latter doesn't really matter as we would fall through to the safe
> mutex-protected version.
> The former isn't much of an issue for most filesystems under nfsd, but still
> needs to be handled to some extent.
> Also, the comment for __d_lookup says
> 
>  * __d_lookup callers must be commented.
> 
> The simplest resolution would be:
> 
> 	/* __d_lookup() is used to try to get a quick answer and avoid the
> 	 * mutex.  A false-negative does no harm.
> 	 */
> 	ret = __d_lookup(base, &this);
> 	if (ret && ret->d_flags & DCACHE_OP_REVALIDATE) {
> 		dput(ret);
> 		ret = NULL;
> 	}
> 	if (ret)
> 		return ret;

That looks right to me....  I'll probably take the simple option.

> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index a30e79900086..4accdb00976b 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -217,6 +217,13 @@ nfsd_lookup_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		host_err = PTR_ERR(dentry);
> >  		if (IS_ERR(dentry))
> >  			goto out_nfserr;
> > +		if (!(d_inode(dentry) && S_ISREG(d_inode(dentry)->i_mode))) {
> > +			/*
> > +			 * Never mind, we're not going to open this
> > +			 * anyway.  And we don't want to hold it over
> > +			 * the userspace upcalls in nfsd_mountpoint. */
> > +			fh_unlock(fhp);
> > +		}
> 
> You could avoid the test by just putting the fh_unlock(fhp) inside the 
> 
> 		if (nfsd_mountpoint(dentry, exp)) {
> 
> block.  It might also be appropriate for nfsd_mountpoint to fail on
> non-directories.

If check_export()'s to be believed, our support for regular-file
exports is intentional. So even if nfsd_mountpoint() is true, it's
possible we might still be about to open the result.

But I think we're OK:

The only reason for keeping the i_mutex here in the open case is to
avoid the situation where a client does OPEN(dirfh, "foo") and gets a
reply that grants a delegation even though the file has actually been
renamed to "bar".  (Thus violating the protocol, since the delegation's
supposed to guarantee you'll be notified before the file's renamed.)

(So the i_mutex is actually overkill, it would be enough to detect a
rename and then refuse the delegation (which we're always allowed to
do).)

But actually, if this is a mountpoint then I suppose we're safe from a
rename.  So, right, we can just move that unlock into the mountpoint
case.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html