Re: [PATCH RFC] NFSD: fix cannot umounting mount points under pseudo root

NeilBrown <neilb@xxxxxxx> · Fri, 1 May 2015 11:53:26 +1000

On Thu, 30 Apr 2015 17:36:02 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
wrote:

> On Thu, Apr 30, 2015 at 07:52:25AM +1000, NeilBrown wrote:
> > On Wed, 29 Apr 2015 15:19:34 -0400 "J. Bruce Fields" <bfields@xxxxxxxxxxxx>
> > wrote:
> > > Maybe drop the locking from nfsd_buffered_readdir and *just* take the
> > > i_mutex around lookup_one_len(), if that's the only place we need it?
> > 
> > I think it is the only place needed.  Certainly normal path lookup
> > only takes i_mutex very briefly around __lookup_hash().
> > 
> > One cost would be that we would take the mutex for every name instead
> > of once for a whole set of names.  That is generally frowned upon for
> > performance reasons.
> > 
> > An alternative might be to stop using lookup_one_len, and use
> > kern_path() instead.  Or kern_path_mountpoint if we don't want to
> > cross a mountpoint.
> > 
> > They would only take the i_mutex if absolutely necessary.
> > 
> > The draw back is that they don't take a 'len' argument, so we would
> > need to make sure the name  is nul terminated (and maybe double-check
> > that there is no '/'??).  It would be fairly easy to nul-terminate
> > names from readdir - just use a bit more space  in the buffer in
> > nfsd_buffered_filldir().
> 
> They're also a lot more complicated than lookup_one_len.  Is any of this
> extra stuff they do (audit?) going to bite us?
> 
> For me understanding that would be harder than writing a variant of
> lookup_one_len that took the i_mutex when required.  But I guess that's
> my problem, I should try to understand the lookup code better....

Tempting though ... see below (untested).

While writing that I began to wonder if lookup_one_len is really the right
interface to be used, even though it was introduced (in 2.3.99pre2-4)
specifically for nfsd.
The problem is that it assumes things about the filesystem.  So it makes
perfect sense for various filesystems to use it on themselves, but I'm not
sure how *right* it is for nfsd (or cachefiles etc) to use it on some
*other* filesystem.
The particular issue is that it avoids the d_revalidate call.
Both vfat and reiserfs have that call ... I wonder if that could ever be a
problem.

So I'm really leaning towards creating a variant of kern_path_mountpoint and
using a variant of that which takes a length.

I think adding the d_revalidate is correct  and even adding auditing is
probably a good idea.

Maybe I'll try that (unless someone beats me to it...)

NeilBrown


> 
> > I'm less sure about the locking needed in nfsd_lookup_dentry().  The
> > comments suggests that there is good reason to keep the lock for an
> > extended period.  But that reasoning might only apply to files, and
> > nfsd_mountpoint should  only be true for directories... I hope.
> 
> I thought d_mountpoint() could be true for files, but certainly we won't
> be doing nfs4 opens on directories.
> 
> > A different approach would be to pass NULL for the rqstp to
> > nfsd_cross_mnt().  This will ensure it doesn't block, but it might
> > fail incorrectly.  If it does fail, we drop the lock, retry with the
> > real rqstp, retake the lock and .... no, I think that gets a bit too
> > messy.
> 
> Yes.
> 
> > I think I'm in favour of:
> >  - not taking the lock in readdir
> >  - maybe not taking it in lookup
> >  - using kern_path_mountpoint or kern_path
> >  - nul terminating then names, copying the nfsd_lookup name to
> >    __getname() if necessary.
> > 
> > Sound reasonable?
> 
> I guess so, though I wish I understood kern_path_mountpoint better.
> 
> And nfsd_lookup_dentry looks like work for another day.  No, wait, I
> forgot the goal here: you're right, nfsd_lookup_dentry has the same
> upcall-under-i_mutex problem, so we need to fix it too.
> 
> OK, agreed.
> 
> --b.

diff --git a/fs/namei.c b/fs/namei.c
index 4a8d998b7274..d6b2dc36029e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2182,6 +2182,56 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
 }
 EXPORT_SYMBOL(lookup_one_len);
 
+struct dentry *lookup_one_len_unlocked(const char *name,
+				       struct dentry *base, int len)
+{
+	struct qstr this;
+	unsigned int c;
+	int err;
+	struct dentry *ret;
+
+	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
+
+	this.name = name;
+	this.len = len;
+	this.hash = full_name_hash(name, len);
+	if (!len)
+		return ERR_PTR(-EACCES);
+
+	if (unlikely(name[0] == '.')) {
+		if (len < 2 || (len == 2 && name[1] == '.'))
+			return ERR_PTR(-EACCES);
+	}
+
+	while (len--) {
+		c = *(const unsigned char *)name++;
+		if (c == '/' || c == '\0')
+			return ERR_PTR(-EACCES);
+	}
+	/*
+	 * See if the low-level filesystem might want
+	 * to use its own hash..
+	 */
+	if (base->d_flags & DCACHE_OP_HASH) {
+		int err = base->d_op->d_hash(base, &this);
+		if (err < 0)
+			return ERR_PTR(err);
+	}
+
+	err = inode_permission(base->d_inode, MAY_EXEC);
+	if (err)
+		return ERR_PTR(err);
+
+	ret = __d_lookup(base, &this);
+	if (ret)
+		retrun ret;
+	mutex_lock(&base->d_inode->i_mutex);
+	ret =  __lookup_hash(&this, base, 0);
+	mutex_unlock(&base->d_inode->i_mutex);
+	return ret;
+}
+EXPORT_SYMBOL(lookup_one_len_unlocked);
+
 int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
 		 struct path *path, int *empty)
 {
Attachment:
pgp7UPg7c2kaX.pgp

Description: OpenPGP digital signature