On Sat, 15 Aug 2015 20:27:13 -0500 ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote: > > In rare cases a directory can be renamed out from under a bind mount. > In those cases without special handling it becomes possible to walk up > the directory tree to the root dentry of the filesystem and down > from the root dentry to every other file or directory on the filesystem. > > Like division by zero .. from an unconnected path can not be given > a useful semantic as there is no predicting at which path component > the code will realize it is unconnected. We certainly can not match > the current behavior as the current behavior is a security hole. > > Therefore when encounting .. when following an unconnected path > return -ENOENT. > > - Add a function path_connected to verify path->dentry is reachable > from path->mnt.mnt_root. AKA to validate that rename did not do > something nasty to the bind mount. > > To avoid races path_connected must be called after following a path > component to it's next path component. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > --- > > This is the simple version that needs no extra vfs support. > > My availability is likely to be a bit spotty for the next while > as I am travelling to and then attending Linux Plumbers Conference. > > fs/namei.c | 27 +++++++++++++++++++++++++-- > 1 file changed, 25 insertions(+), 2 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index ae4e4c18b2ac..5303e994f8d6 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -560,6 +560,24 @@ static int __nd_alloc_stack(struct nameidata *nd) > return 0; > } > > +/** > + * path_connected - Verify that a path->dentry is below path->mnt.mnt_root > + * @path: nameidate to verify > + * > + * Rename can sometimes move a file or directory outside of a bind > + * mount, path_connected allows those cases to be detected. While it is obviously true that a rename can move a file outside of a bind mount, it doesn't seem relevant and so could be confusing. This is only ever used for directories, and a file could already be outside a bind mount even while it is inside. I would stick with "Rename can sometimes move a directory outside..." > + */ > +static bool path_connected(const struct path *path) > +{ > + struct vfsmount *mnt = path->mnt; > + > + /* Only bind mounts can have disconnected paths */ > + if (mnt->mnt_root == mnt->mnt_sb->s_root) > + return true; > + > + return is_subdir(path->dentry, mnt->mnt_root); > +} > + > static inline int nd_alloc_stack(struct nameidata *nd) > { > if (likely(nd->depth != EMBEDDED_LEVELS)) > @@ -1296,6 +1314,8 @@ static int follow_dotdot_rcu(struct nameidata *nd) > return -ECHILD; > nd->path.dentry = parent; > nd->seq = seq; > + if (unlikely(!path_connected(&nd->path))) > + return -ENOENT; > break; > } else { > struct mount *mnt = real_mount(nd->path.mnt); > @@ -1396,7 +1416,7 @@ static void follow_mount(struct path *path) > } > } > > -static void follow_dotdot(struct nameidata *nd) > +static int follow_dotdot(struct nameidata *nd) > { > if (!nd->root.mnt) > set_root(nd); > @@ -1412,6 +1432,8 @@ static void follow_dotdot(struct nameidata *nd) > /* rare case of legitimate dget_parent()... */ > nd->path.dentry = dget_parent(nd->path.dentry); > dput(old); > + if (unlikely(!path_connected(&nd->path))) > + return -ENOENT; > break; > } > if (!follow_up(&nd->path)) > @@ -1419,6 +1441,7 @@ static void follow_dotdot(struct nameidata *nd) > } > follow_mount(&nd->path); > nd->inode = nd->path.dentry->d_inode; > + return 0; > } > > /* > @@ -1634,7 +1657,7 @@ static inline int handle_dots(struct nameidata *nd, int type) > if (nd->flags & LOOKUP_RCU) { > return follow_dotdot_rcu(nd); > } else > - follow_dotdot(nd); > + return follow_dotdot(nd); > } > return 0; > } I really like this patch, particularly from the standpoint of backporting to -stable and enterprise kernels. I suspect that all the tracking of which mounts might have been escaped from should be classified as premature optimisation, until measurements show otherwise. path_connected() adds no locks or even atomics. The path that it walks will be well-trodden and so very likely most of it will be in the CPU cache. And I particularly like that follow_dotdot() and follow_dotdot_rcu() now both that the same signature :-) What's not to like? Reviewed-by: NeilBrown <neilb@xxxxxxxx> in case it helps. NeilBrown _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers