[PATCH/RFC] VFS: LOOKUP_MOUNTPOINT should used cached info whenever possible.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When performing a LOOKUP_MOUNTPOINT lookup we don't really want to
engage with underlying systems at all.  Any mount point MUST be in the
dcache with a complete direct path from the root to the mountpoint.
That should be sufficient to find the mountpoint given a path name.

This becomes an issue when the filesystem changes unexpected, such as
when a NFS server is changed to reject all access.  It then becomes
impossible to unmount anything mounted on the filesystem which has
changed.  We could simply lazy-unmount the changed filesystem and that
will often be sufficient.  However if the target filesystem needs
"umount -f" to complete the unmount properly, then the lazy unmount will
leave it incompletely unmounted.  When "-f" is needed, we really need to
be able to name the target filesystem.

We NEVER want to revalidate anything.  We already avoid the revalidation
of the mountpoint itself, be we won't need to revalidate anything on the
path either as thay might affect the cache, and the cache is what we are
really looking in.

Permission checks are a little less clear.  We currently allow any user
to make the umount syscall and perform the path lookup and only reject
unprivileged users once the target mount point has been found.  If we
completely relax permission checks then an unprivileged user could probe
inaccessible parts of the name space by examining the error returned
from umount().

So we only relax permission check when the user has CAP_SYS_ADMIN
(may_mount() succeeds).

Note that if the path given is not direct and for example uses symlinks
or "..", then dentries or symlink content may not be cached and a remote
server could cause problem.  We can only be certain of a safe unmount if
a direct path is used.

Signed-off-by: NeilBrown <neilb@xxxxxxx>
---
 fs/namei.c | 46 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index edfedfbccaef..f2df1adae7c5 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -498,8 +498,8 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
  *
  * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
  */
-int inode_permission(struct mnt_idmap *idmap,
-		     struct inode *inode, int mask)
+int inode_permission_mp(struct mnt_idmap *idmap,
+			struct inode *inode, int mask, bool mp)
 {
 	int retval;
 
@@ -523,7 +523,14 @@ int inode_permission(struct mnt_idmap *idmap,
 			return -EACCES;
 	}
 
-	retval = do_inode_permission(idmap, inode, mask);
+	if (mp)
+		/* We are looking for a mountpoint and so don't
+		 * want to interact with the filesystem at all, just
+		 * the dcache and icache.
+		 */
+		retval = generic_permission(idmap, inode, mask);
+	else
+		retval = do_inode_permission(idmap, inode, mask);
 	if (retval)
 		return retval;
 
@@ -533,6 +540,24 @@ int inode_permission(struct mnt_idmap *idmap,
 
 	return security_inode_permission(inode, mask);
 }
+
+/**
+ * inode_permission - Check for access rights to a given inode
+ * @idmap:	idmap of the mount the inode was found from
+ * @inode:	Inode to check permission on
+ * @mask:	Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
+ *
+ * Check for read/write/execute permissions on an inode.  We use fs[ug]id for
+ * this, letting us set arbitrary permissions for filesystem access without
+ * changing the "normal" UIDs which are used for other things.
+ *
+ * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
+ */
+int inode_permission(struct mnt_idmap *idmap,
+		     struct inode *inode, int mask)
+{
+	return inode_permission_mp(idmap, inode, mask, false);
+}
 EXPORT_SYMBOL(inode_permission);
 
 /**
@@ -589,6 +614,7 @@ struct nameidata {
 #define ND_ROOT_PRESET 1
 #define ND_ROOT_GRABBED 2
 #define ND_JUMPED 4
+#define ND_SYS_ADMIN 8
 
 static void __set_nameidata(struct nameidata *p, int dfd, struct filename *name)
 {
@@ -853,7 +879,8 @@ static bool try_to_unlazy_next(struct nameidata *nd, struct dentry *dentry)
 
 static inline int d_revalidate(struct dentry *dentry, unsigned int flags)
 {
-	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE))
+	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE) &&
+	    likely(!(flags & LOOKUP_MOUNTPOINT)))
 		return dentry->d_op->d_revalidate(dentry, flags);
 	else
 		return 1;
@@ -1708,12 +1735,17 @@ static struct dentry *lookup_slow(const struct qstr *name,
 static inline int may_lookup(struct mnt_idmap *idmap,
 			     struct nameidata *nd)
 {
+	/* If we are looking for a mountpoint and we have the SYS_ADMIN
+	 * capability, then we will by-pass the filesys for permission checks
+	 * and just use generic_permission().
+	 */
+	bool mp = (nd->flags & LOOKUP_MOUNTPOINT) && (nd->state & ND_SYS_ADMIN);
 	if (nd->flags & LOOKUP_RCU) {
-		int err = inode_permission(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
+		int err = inode_permission_mp(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK, mp);
 		if (err != -ECHILD || !try_to_unlazy(nd))
 			return err;
 	}
-	return inode_permission(idmap, nd->inode, MAY_EXEC);
+	return inode_permission_mp(idmap, nd->inode, MAY_EXEC, mp);
 }
 
 static int reserve_stack(struct nameidata *nd, struct path *link)
@@ -2501,6 +2533,8 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 	set_nameidata(&nd, dfd, name, root);
+	if ((flags & LOOKUP_MOUNTPOINT) && may_mount())
+		nd.state = ND_SYS_ADMIN;
 	retval = path_lookupat(&nd, flags | LOOKUP_RCU, path);
 	if (unlikely(retval == -ECHILD))
 		retval = path_lookupat(&nd, flags, path);
-- 
2.40.0





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux