On Fri, Oct 30, 2020 at 01:10:52PM +0200, Amir Goldstein wrote: > [reducing CC list for overlayfs specific comments] > > On Thu, Oct 29, 2020 at 2:41 AM Christian Brauner > <christian.brauner@xxxxxxxxxx> wrote: > > > > As an overlay filesystem overlayfs can be mounted on top of other filesystems > > and bind-mounts. This means it can also be bind-mounted on top of one or > > multiple idmapped lower directories and/or an idmapped upper directory. > > In previous patches we've enabled the vfs to handle idmapped mounts and so we > > should have all of the helpers available to let overlayfs handle idmapped > > mounts. To handle such scenarios correctly overlayfs needs to be switched from > > non-idmapped mount aware vfs helpers to idmapped mount aware vfs helpers. > > In order to have overlayfs correctly support idmapped mounts as lower and upper > > directories we need to pass down the mount's user namespace associated with the > > lower and upper directories whenver we perform idmapped mount aware operations. > > > > Luckily, when overlayfs is mounted it creates private mounts of the lower and > > upper directories via clone_private_mount() which calls clone_mnt() internally. > > If any of the lower or upper directories are on an idmapped mount then > > clone_mnt() called in clone_private_mount() will also pin the user namespace > > the vfsmount has been marked with. Overlayfs stashes the information about the > > lower and upper directories and the mounts that they are on so that this > > information can be retrieved when needed. This makes it possible to support > > idmapped mounts as lower and upper directories. Support for idmapped merged > > mounts will be added in a follow-up patch. > > > > Whenever we perform idmap mount aware operations we need to pass down the > > mount's user namespace to the vfs helpers we've introduced in earlier patches. > > Permission checks on the lower and upper directories are performed by switching > > from the inode_permission() and inode_owner_or_capable() helpers to the new > > mapped_inode_permission() and mapped_inode_owner_or_capable() helpers. > > Similarly we switch from non-idmapped mount aware lookup helpers to > > idmapped-mount aware lookup helpers. In all cases where we need to check > > permissions in the lower or upper directories we pass down the mount associated > > with the lower and upper directory at the time of creating the overlayfs mount. > > This nicely lines up with the permission model outlined in the overlayfs > > documentation (Special thanks to Amir for pointing me to this document!). > > > > Thank to Amir for pointing me to the overlayfs permission model documentation! > > > > A very special thank you to my friend Seth Forshee who has given invaluable > > advice when coming up with these patches! > > > > As an example let's create overlayfs mount in the initial user namespace with > > an idmapped lower and upper mount: > > > > # This is a directory where all file ownership starts with uid and gid 10000. > > root@f2-vm:/# ls -al /var/lib/lxc/f1/rootfs > > total 108 > > drwxr-xr-x 20 10000 10000 4096 Oct 28 11:13 . > > drwxrwx--- 4 10000 10000 4096 Oct 28 11:17 .. > > -rw-r--r-- 1 10000 10000 7197 Oct 24 09:45 asdf > > drwxr-xr-x 2 10000 10000 4096 Oct 16 19:07 ASDF > > lrwxrwxrwx 1 10000 10000 7 Sep 24 07:43 bin -> usr/bin > > drwxr-xr-x 2 10000 10000 4096 Apr 15 2020 boot > > -rw-r--r-- 1 10000 10000 13059 Oct 8 12:38 ccc > > drwxr-xr-x 2 11000 11000 4096 Oct 23 17:10 ddd > > drwxr-xr-x 3 10000 10000 4096 Sep 25 08:04 dev > > drwxr-xr-x 61 10000 10000 4096 Sep 25 08:04 etc > > > > # Create an idmapped mount on the host such that all files owned by uid and > > # gid 10000 show up as being owned by uid 0 and gid 0. > > /mount2 --idmap both:10000:0:10000 /var/lib/lxc/f1/rootfs/ /lower1/ > > > > # Verify that the files show up as uid and gid 0 under the idmapped mount at /lower1 > > root@f2-vm:/# ls -al /lower1/ > > total 108 > > drwxr-xr-x 20 root root 4096 Oct 28 11:13 . > > drwxr-xr-x 29 root root 4096 Oct 28 11:57 .. > > -rw-r--r-- 1 root root 7197 Oct 24 09:45 asdf > > drwxr-xr-x 2 root root 4096 Oct 16 19:07 ASDF > > lrwxrwxrwx 1 root root 7 Sep 24 07:43 bin -> usr/bin > > drwxr-xr-x 2 root root 4096 Apr 15 2020 boot > > -rw-r--r-- 1 root root 13059 Oct 8 12:38 ccc > > drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 23 17:10 ddd > > drwxr-xr-x 3 root root 4096 Sep 25 08:04 dev > > drwxr-xr-x 61 root root 4096 Sep 25 08:04 etc > > > > # Create an idmapped upper mount at /upper. Now, files created as id 0 will > > # show up as id 10000 in /upper and files created as id 1000 will show up as > > # id 11000 under /upper. > > /mount2 --idmap both:10000:0:10000 /upper /upper > > mkdir /upper/upper > > mkdir /upper/work > > > > # Create an overlayfs mount. > > mount -t overlay overlay -o lowerdir=/lower1/,upperdir=/upper/upper/,workdir=/upper/work/ /merged/ > > > > root@f2-vm:/# ls -al /merged/ > > total 124 > > drwxr-xr-x 1 root root 4096 Oct 25 23:04 . > > drwxr-xr-x 29 root root 4096 Oct 28 12:07 .. > > -rw-r--r-- 1 root root 7197 Oct 24 09:45 asdf > > drwxr-xr-x 2 root root 4096 Oct 16 19:07 ASDF > > lrwxrwxrwx 1 root root 7 Sep 24 07:43 bin -> usr/bin > > drwxr-xr-x 2 root root 4096 Apr 15 2020 boot > > -rw-r--r-- 1 root root 13059 Oct 8 12:38 ccc > > drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 23 17:10 ddd > > drwxr-xr-x 3 root root 4096 Sep 25 08:04 dev > > drwxr-xr-x 61 root root 4096 Sep 25 08:04 etc > > > > # Create a file as as root > > root@f2-vm:/merged# touch /merged/A-FILE > > > > root@f2-vm:/merged# ls -al /merged/A-FILE > > -rw-r--r-- 1 root root 0 Oct 28 12:16 /merged/A-FILE > > > > # Chown the file to a simple user > > root@f2-vm:/merged# chown 1000:1000 /merged/A-FILE > > > > root@f2-vm:/merged# ls -al /merged/A-FILE > > -rw-r--r-- 1 ubuntu ubuntu 0 Oct 28 12:16 /merged/A-FILE > > > > # Create a directory and delegate to simple user > > root@f2-vm:/merged# mkdir /merged/A-DIR > > > > root@f2-vm:/merged# chown 1000:1000 /merged/A-DIR/ > > > > # Login as user > > root@f2-vm:/merged# sudo -u ubuntu -- bash -i > > > > # Create a file as simpel user > > ubuntu@f2-vm:/merged$ touch /merged/A-DIR/A-USER-FILE > > > > ubuntu@f2-vm:/merged$ ls -al /merged/A-DIR/A-USER-FILE > > -rw-rw-r-- 1 ubuntu ubuntu 0 Oct 28 12:18 /merged/A-DIR/A-USER-FILE > > > > # Let's look at these files in our idmapped upper directory > > ubuntu@f2-vm:/$ ls -alR /upper/upper/ > > /upper/upper/: > > total 12 > > drwxr-xr-x 3 root root 4096 Oct 28 12:23 . > > drwxr-xr-x 4 root root 4096 Oct 21 13:48 .. > > drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 12:18 A-DIR > > -rw-r--r-- 1 ubuntu ubuntu 0 Oct 28 12:16 A-FILE > > > > /upper/upper/A-DIR: > > total 8 > > drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 12:18 . > > drwxr-xr-x 3 root root 4096 Oct 28 12:23 .. > > -rw-rw-r-- 1 ubuntu ubuntu 0 Oct 28 12:18 A-USER-FILE > > > > # Let's remove the idmapped /upper mount (overlayfs will have it's own private mount anyway) > > umount /upper > > > > # Let's look at these files in our upper directory with the idmapped mount removed > > ubuntu@f2-vm:/$ ls -alR /upper/upper/ > > /upper/upper/: > > total 12 > > drwxr-xr-x 3 10000 10000 4096 Oct 28 12:23 . > > drwxr-xr-x 4 10000 10000 4096 Oct 21 13:48 .. > > drwxr-xr-x 2 11000 11000 4096 Oct 28 12:18 A-DIR > > -rw-r--r-- 1 11000 11000 0 Oct 28 12:16 A-FILE > > > > /upper/upper/A-DIR: > > total 8 > > drwxr-xr-x 2 11000 11000 4096 Oct 28 12:18 . > > drwxr-xr-x 3 10000 10000 4096 Oct 28 12:23 .. > > -rw-rw-r-- 1 11000 11000 0 Oct 28 12:18 A-USER-FILE > > > > # Let's create a few acls from the /merged directory on an already existing file > > # triggering a copy-up operation > > root@f2-vm:/merged# setfacl -m u:1000:rwx /merged/asdf > > root@f2-vm:/merged# getfacl /merged/asdf > > getfacl: Removing leading '/' from absolute path names > > # file: merged/asdf > > # owner: root > > # group: root > > user::rw- > > user:ubuntu:rwx > > group::r-- > > mask::rwx > > other::r-- > > > > # Let's look at this file from our upper directory > > root@f2-vm:/merged# getfacl /upper/upper/asdf > > getfacl: Removing leading '/' from absolute path names > > # file: upper/upper/asdf > > # owner: 10000 > > # group: 10000 > > user::rw- > > user:11000:rwx > > group::r-- > > mask::rwx > > other::r-- > > > > Cc: Seth Forshee <seth.forshee@xxxxxxxxxxxxx> > > Cc: Amir Goldstein <amir73il@xxxxxxxxx> > > Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx> > > --- > > fs/overlayfs/copy_up.c | 100 +++++++++++++++----------- > > fs/overlayfs/dir.c | 151 ++++++++++++++++++++++----------------- > > fs/overlayfs/export.c | 3 +- > > fs/overlayfs/file.c | 23 +++--- > > fs/overlayfs/inode.c | 89 ++++++++++++++++++----- > > fs/overlayfs/namei.c | 64 ++++++++++------- > > fs/overlayfs/overlayfs.h | 149 ++++++++++++++++++++++++++------------ > > fs/overlayfs/ovl_entry.h | 1 + > > fs/overlayfs/readdir.c | 34 +++++---- > > fs/overlayfs/super.c | 106 ++++++++++++++++++++------- > > fs/overlayfs/util.c | 38 +++++----- > > 11 files changed, 494 insertions(+), 264 deletions(-) > > > > Christian, > > I think the vfs patches raised some eyebrows, but I can understand why > you started > with the explicit opt-in approach. > If for nothing else, this posting is very educating about all the > places that idmapping > affects. > > WRT this overlayfs patch. It takes the same approach as the vfs patched, passing > user_ns all over the place, but there is usually no need for that, > because in most > cases, there is a much less intrusive way to get to the mnt. > > Some examples below. > > > diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c > > index 955ecd4030f0..1b8721796fd4 100644 > > --- a/fs/overlayfs/copy_up.c > > +++ b/fs/overlayfs/copy_up.c > > @@ -43,7 +43,8 @@ static bool ovl_must_copy_xattr(const char *name) > > !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN); > > } > > > > -int ovl_copy_xattr(struct super_block *sb, struct dentry *old, > > +int ovl_copy_xattr(struct super_block *sb, struct user_namespace *old_user_ns, > > + struct dentry *old, struct user_namespace *new_user_ns, > > struct dentry *new) > > > In this helper both callers already have the old path, so can pass it > into the helper. > new is always upper (because we never modify lower), so if you pass > ovl_fs to the helper (we are slowly passing it down to all helpers) you > will have the upper mnt. > > > { > > ssize_t list_size, size, value_size = 0; > > @@ -85,9 +86,9 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old, > > if (ovl_is_private_xattr(sb, name)) > > continue; > > retry: > > - size = vfs_getxattr(old, name, value, value_size); > > + size = vfs_mapped_getxattr(old_user_ns, old, name, value, value_size); > > if (size == -ERANGE) > > - size = vfs_getxattr(old, name, NULL, 0); > > + size = vfs_mapped_getxattr(old_user_ns, old, name, NULL, 0); > > > > if (size < 0) { > > error = size; > > @@ -114,7 +115,7 @@ int ovl_copy_xattr(struct super_block *sb, struct dentry *old, > > error = 0; > > continue; /* Discard */ > > } > > - error = vfs_setxattr(new, name, value, size, 0); > > + error = vfs_mapped_setxattr(new_user_ns, new, name, value, size, 0); > > if (error) { > > if (error != -EOPNOTSUPP || ovl_must_copy_xattr(name)) > > break; > > @@ -228,17 +229,19 @@ static int ovl_copy_up_data(struct ovl_fs *ofs, struct path *old, > > return error; > > } > > > > -static int ovl_set_size(struct dentry *upperdentry, struct kstat *stat) > > +static int ovl_set_size(struct user_namespace *user_ns, > > + struct dentry *upperdentry, struct kstat *stat) > > { > > struct iattr attr = { > > .ia_valid = ATTR_SIZE, > > .ia_size = stat->size, > > }; > > > > - return notify_change(upperdentry, &attr, NULL); > > + return notify_mapped_change(user_ns, upperdentry, &attr, NULL); > > } > > > > -static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat) > > +static int ovl_set_timestamps(struct user_namespace *user_ns, > > + struct dentry *upperdentry, struct kstat *stat) > > { > > struct iattr attr = { > > .ia_valid = > > @@ -247,10 +250,11 @@ static int ovl_set_timestamps(struct dentry *upperdentry, struct kstat *stat) > > .ia_mtime = stat->mtime, > > }; > > > > - return notify_change(upperdentry, &attr, NULL); > > + return notify_mapped_change(user_ns, upperdentry, &attr, NULL); > > } > > > > -int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat) > > +int ovl_set_attr(struct user_namespace *user_ns, struct dentry *upperdentry, > > + struct kstat *stat) > > { > > int err = 0; > > > > @@ -259,7 +263,7 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat) > > .ia_valid = ATTR_MODE, > > .ia_mode = stat->mode, > > }; > > - err = notify_change(upperdentry, &attr, NULL); > > + err = notify_mapped_change(user_ns, upperdentry, &attr, NULL); > > } > > if (!err) { > > struct iattr attr = { > > @@ -267,10 +271,10 @@ int ovl_set_attr(struct dentry *upperdentry, struct kstat *stat) > > .ia_uid = stat->uid, > > .ia_gid = stat->gid, > > }; > > - err = notify_change(upperdentry, &attr, NULL); > > + err = notify_mapped_change(user_ns, upperdentry, &attr, NULL); > > } > > if (!err) > > - ovl_set_timestamps(upperdentry, stat); > > + ovl_set_timestamps(user_ns, upperdentry, stat); > > > > return err; > > } > > Same to all those helpers that get upperdentry, you can pass ovl_fs. > > > @@ -356,8 +360,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower, > > } > > > > /* Store file handle of @upper dir in @index dir entry */ > > -static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper, > > - struct dentry *index) > > +static int ovl_set_upper_fh(struct ovl_fs *ofs, struct user_namespace *user_ns, > > + struct dentry *upper, struct dentry *index) > > { > > And here you already have ovl_fs. > > > const struct ovl_fh *fh; > > int err; > > @@ -377,7 +381,8 @@ static int ovl_set_upper_fh(struct ovl_fs *ofs, struct dentry *upper, > > * > > * Caller must hold i_mutex on indexdir. > > */ > > -static int ovl_create_index(struct dentry *dentry, struct dentry *origin, > > +static int ovl_create_index(struct user_namespace *user_ns, > > + struct dentry *dentry, struct dentry *origin, > > struct dentry *upper) > > { > > struct dentry *indexdir = ovl_indexdir(dentry->d_sb); > > @@ -406,25 +411,25 @@ static int ovl_create_index(struct dentry *dentry, struct dentry *origin, > > if (err) > > return err; > > > > - temp = ovl_create_temp(indexdir, OVL_CATTR(S_IFDIR | 0)); > > + temp = ovl_create_temp(user_ns, indexdir, OVL_CATTR(S_IFDIR | 0)); > > err = PTR_ERR(temp); > > if (IS_ERR(temp)) > > goto free_name; > > > > - err = ovl_set_upper_fh(OVL_FS(dentry->d_sb), upper, temp); > > + err = ovl_set_upper_fh(OVL_FS(dentry->d_sb), user_ns, upper, temp); > > if (err) > > goto out; > > > > - index = lookup_one_len(name.name, indexdir, name.len); > > + index = lookup_one_len_mapped(name.name, indexdir, name.len, user_ns); > > if (IS_ERR(index)) { > > err = PTR_ERR(index); > > } else { > > - err = ovl_do_rename(dir, temp, dir, index, 0); > > + err = ovl_do_rename(dir, user_ns, temp, dir, user_ns, index, 0); > > dput(index); > > } > > out: > > if (err) > > - ovl_cleanup(dir, temp); > > + ovl_cleanup(user_ns, dir, temp); > > dput(temp); > > free_name: > > kfree(name.name); > > @@ -441,6 +446,7 @@ struct ovl_copy_up_ctx { > > struct dentry *destdir; > > struct qstr destname; > > struct dentry *workdir; > > + struct user_namespace *user_ns; > > bool origin; > > bool indexed; > > bool metacopy; > > @@ -463,16 +469,17 @@ static int ovl_link_up(struct ovl_copy_up_ctx *c) > > return err; > > > > inode_lock_nested(udir, I_MUTEX_PARENT); > > - upper = lookup_one_len(c->dentry->d_name.name, upperdir, > > - c->dentry->d_name.len); > > + upper = lookup_one_len_mapped(c->dentry->d_name.name, upperdir, > > + c->dentry->d_name.len, > > + c->user_ns); > > err = PTR_ERR(upper); > > if (!IS_ERR(upper)) { > > - err = ovl_do_link(ovl_dentry_upper(c->dentry), udir, upper); > > + err = ovl_do_link(c->user_ns, ovl_dentry_upper(c->dentry), udir, upper); > > dput(upper); > > > > if (!err) { > > /* Restore timestamps on parent (best effort) */ > > - ovl_set_timestamps(upperdir, &c->pstat); > > + ovl_set_timestamps(c->user_ns, upperdir, &c->pstat); > > ovl_dentry_set_upper_alias(c->dentry); > > } > > } > > @@ -509,7 +516,8 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp) > > return err; > > } > > > > - err = ovl_copy_xattr(c->dentry->d_sb, c->lowerpath.dentry, temp); > > + err = ovl_copy_xattr(c->dentry->d_sb, mnt_user_ns(c->lowerpath.mnt), > > + c->lowerpath.dentry, c->user_ns, temp); > > if (err) > > return err; > > > > @@ -535,9 +543,9 @@ static int ovl_copy_up_inode(struct ovl_copy_up_ctx *c, struct dentry *temp) > > > > inode_lock(temp->d_inode); > > if (S_ISREG(c->stat.mode)) > > - err = ovl_set_size(temp, &c->stat); > > + err = ovl_set_size(c->user_ns, temp, &c->stat); > > if (!err) > > - err = ovl_set_attr(temp, &c->stat); > > + err = ovl_set_attr(c->user_ns, temp, &c->stat); > > inode_unlock(temp->d_inode); > > > > return err; > > @@ -598,7 +606,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) > > if (err) > > goto unlock; > > > > - temp = ovl_create_temp(c->workdir, &cattr); > > + temp = ovl_create_temp(c->user_ns, c->workdir, &cattr); > > ovl_revert_cu_creds(&cc); > > > > err = PTR_ERR(temp); > > @@ -610,17 +618,18 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) > > goto cleanup; > > > > if (S_ISDIR(c->stat.mode) && c->indexed) { > > - err = ovl_create_index(c->dentry, c->lowerpath.dentry, temp); > > + err = ovl_create_index(c->user_ns, c->dentry, c->lowerpath.dentry, temp); > > if (err) > > goto cleanup; > > } > > > > - upper = lookup_one_len(c->destname.name, c->destdir, c->destname.len); > > + upper = lookup_one_len_mapped(c->destname.name, c->destdir, c->destname.len, > > + c->user_ns); > > err = PTR_ERR(upper); > > if (IS_ERR(upper)) > > goto cleanup; > > > > - err = ovl_do_rename(wdir, temp, udir, upper, 0); > > + err = ovl_do_rename(wdir, c->user_ns, temp, udir, c->user_ns, upper, 0); > > dput(upper); > > if (err) > > goto cleanup; > > @@ -637,7 +646,7 @@ static int ovl_copy_up_workdir(struct ovl_copy_up_ctx *c) > > return err; > > > > cleanup: > > - ovl_cleanup(wdir, temp); > > + ovl_cleanup(c->user_ns, wdir, temp); > > dput(temp); > > goto unlock; > > } > > @@ -654,7 +663,7 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c) > > if (err) > > return err; > > > > - temp = ovl_do_tmpfile(c->workdir, c->stat.mode); > > + temp = ovl_do_tmpfile(c->user_ns, c->workdir, c->stat.mode); > > ovl_revert_cu_creds(&cc); > > > > if (IS_ERR(temp)) > > @@ -666,10 +675,11 @@ static int ovl_copy_up_tmpfile(struct ovl_copy_up_ctx *c) > > > > inode_lock_nested(udir, I_MUTEX_PARENT); > > > > - upper = lookup_one_len(c->destname.name, c->destdir, c->destname.len); > > + upper = lookup_one_len_mapped(c->destname.name, c->destdir, c->destname.len, > > + c->user_ns); > > err = PTR_ERR(upper); > > if (!IS_ERR(upper)) { > > - err = ovl_do_link(temp, udir, upper); > > + err = ovl_do_link(c->user_ns, temp, udir, upper); > > dput(upper); > > } > > inode_unlock(udir); > > @@ -757,7 +767,7 @@ static int ovl_do_copy_up(struct ovl_copy_up_ctx *c) > > > > /* Restore timestamps on parent (best effort) */ > > inode_lock(udir); > > - ovl_set_timestamps(c->destdir, &c->pstat); > > + ovl_set_timestamps(c->user_ns, c->destdir, &c->pstat); > > inode_unlock(udir); > > > > ovl_dentry_set_upper_alias(c->dentry); > > @@ -786,12 +796,13 @@ static bool ovl_need_meta_copy_up(struct dentry *dentry, umode_t mode, > > return true; > > } > > > > -static ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value) > > +static ssize_t ovl_getxattr(struct user_namespace *user_ns, > > + struct dentry *dentry, char *name, char **value) > > { > > ssize_t res; > > char *buf; > > > > - res = vfs_getxattr(dentry, name, NULL, 0); > > + res = vfs_mapped_getxattr(user_ns, dentry, name, NULL, 0); > > if (res == -ENODATA || res == -EOPNOTSUPP) > > res = 0; > > > > @@ -800,7 +811,7 @@ static ssize_t ovl_getxattr(struct dentry *dentry, char *name, char **value) > > if (!buf) > > return -ENOMEM; > > > > - res = vfs_getxattr(dentry, name, buf, res); > > + res = vfs_mapped_getxattr(user_ns, dentry, name, buf, res); > > if (res < 0) > > kfree(buf); > > else > > @@ -814,6 +825,7 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) > > { > > struct ovl_fs *ofs = OVL_FS(c->dentry->d_sb); > > struct path upperpath, datapath; > > + struct user_namespace *user_ns; > > int err; > > char *capability = NULL; > > ssize_t cap_size; > > @@ -827,8 +839,8 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) > > return -EIO; > > > > if (c->stat.size) { > > - err = cap_size = ovl_getxattr(upperpath.dentry, XATTR_NAME_CAPS, > > - &capability); > > + err = cap_size = ovl_getxattr(c->user_ns, upperpath.dentry, > > + XATTR_NAME_CAPS, &capability); > > if (cap_size < 0) > > goto out; > > } > > @@ -841,9 +853,10 @@ static int ovl_copy_up_meta_inode_data(struct ovl_copy_up_ctx *c) > > * Writing to upper file will clear security.capability xattr. We > > * don't want that to happen for normal copy-up operation. > > */ > > + user_ns = mnt_user_ns(upperpath.mnt); > > if (capability) { > > - err = vfs_setxattr(upperpath.dentry, XATTR_NAME_CAPS, > > - capability, cap_size, 0); > > + err = vfs_mapped_setxattr(user_ns, upperpath.dentry, > > + XATTR_NAME_CAPS, capability, cap_size, 0); > > if (err) > > goto out_free; > > } > > @@ -887,6 +900,7 @@ static int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry, > > ovl_path_upper(parent, &parentpath); > > ctx.destdir = parentpath.dentry; > > ctx.destname = dentry->d_name; > > + ctx.user_ns = mnt_user_ns(parentpath.mnt); > > > > err = vfs_getattr(&parentpath, &ctx.pstat, > > STATX_ATIME | STATX_MTIME, > > diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c > > index 28a075b5f5b2..23d09de00957 100644 > > --- a/fs/overlayfs/dir.c > > +++ b/fs/overlayfs/dir.c > > @@ -23,15 +23,16 @@ MODULE_PARM_DESC(redirect_max, > > > > static int ovl_set_redirect(struct dentry *dentry, bool samedir); > > > > -int ovl_cleanup(struct inode *wdir, struct dentry *wdentry) > > +int ovl_cleanup(struct user_namespace *user_ns, struct inode *wdir, > > + struct dentry *wdentry) > > I think this one and lookup_temp should also pass ovl_fs, but > need to be careful when they are called from ovl_fill_super() > that upper_mnt is already assigned (should be). > > > { > > int err; > > > > dget(wdentry); > > if (d_is_dir(wdentry)) > > - err = ovl_do_rmdir(wdir, wdentry); > > + err = ovl_do_rmdir(user_ns, wdir, wdentry); > > else > > - err = ovl_do_unlink(wdir, wdentry); > > + err = ovl_do_unlink(user_ns, wdir, wdentry); > > dput(wdentry); > > > > if (err) { > > @@ -42,7 +43,8 @@ int ovl_cleanup(struct inode *wdir, struct dentry *wdentry) > > return err; > > } > > > > -struct dentry *ovl_lookup_temp(struct dentry *workdir) > > +struct dentry *ovl_lookup_temp(struct user_namespace *user_ns, > > + struct dentry *workdir) > > { > > struct dentry *temp; > > char name[20]; > > @@ -51,7 +53,7 @@ struct dentry *ovl_lookup_temp(struct dentry *workdir) > > /* counter is allowed to wrap, since temp dentries are ephemeral */ > > snprintf(name, sizeof(name), "#%x", atomic_inc_return(&temp_id)); > > > > - temp = lookup_one_len(name, workdir, strlen(name)); > > + temp = lookup_one_len_mapped(name, workdir, strlen(name), user_ns); > > if (!IS_ERR(temp) && temp->d_inode) { > > pr_err("workdir/%s already exists\n", name); > > dput(temp); > > @@ -68,13 +70,14 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs) > > struct dentry *whiteout; > > struct dentry *workdir = ofs->workdir; > > struct inode *wdir = workdir->d_inode; > > + struct user_namespace *user_ns = ovl_upper_mnt_user_ns(ofs); > > > > if (!ofs->whiteout) { > > - whiteout = ovl_lookup_temp(workdir); > > + whiteout = ovl_lookup_temp(user_ns, workdir); > > if (IS_ERR(whiteout)) > > goto out; > > > > - err = ovl_do_whiteout(wdir, whiteout); > > + err = ovl_do_whiteout(user_ns, wdir, whiteout); > > if (err) { > > dput(whiteout); > > whiteout = ERR_PTR(err); > > @@ -84,11 +87,11 @@ static struct dentry *ovl_whiteout(struct ovl_fs *ofs) > > } > > > > if (ofs->share_whiteout) { > > - whiteout = ovl_lookup_temp(workdir); > > + whiteout = ovl_lookup_temp(user_ns, workdir); > > if (IS_ERR(whiteout)) > > goto out; > > > > - err = ovl_do_link(ofs->whiteout, wdir, whiteout); > > + err = ovl_do_link(user_ns, ofs->whiteout, wdir, whiteout); > > if (!err) > > goto out; > > > > @@ -110,6 +113,7 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct inode *dir, > > struct dentry *dentry) > > { > > struct inode *wdir = ofs->workdir->d_inode; > > + struct user_namespace *user_ns = ovl_upper_mnt_user_ns(ofs); > > struct dentry *whiteout; > > int err; > > int flags = 0; > > @@ -122,28 +126,28 @@ int ovl_cleanup_and_whiteout(struct ovl_fs *ofs, struct inode *dir, > > if (d_is_dir(dentry)) > > flags = RENAME_EXCHANGE; > > > > - err = ovl_do_rename(wdir, whiteout, dir, dentry, flags); > > + err = ovl_do_rename(wdir, user_ns, whiteout, dir, user_ns, dentry, flags); > > if (err) > > goto kill_whiteout; > > if (flags) > > - ovl_cleanup(wdir, dentry); > > + ovl_cleanup(user_ns, wdir, dentry); > > > > out: > > dput(whiteout); > > return err; > > > > kill_whiteout: > > - ovl_cleanup(wdir, whiteout); > > + ovl_cleanup(user_ns, wdir, whiteout); > > goto out; > > } > > > > -static int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry, > > - umode_t mode) > > +static int ovl_mkdir_real(struct user_namespace *user_ns, struct inode *dir, > > + struct dentry **newdentry, umode_t mode) > > { > > int err; > > struct dentry *d, *dentry = *newdentry; > > > > - err = ovl_do_mkdir(dir, dentry, mode); > > + err = ovl_do_mkdir(user_ns, dir, dentry, mode); > > if (err) > > return err; > > > > @@ -155,8 +159,8 @@ static int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry, > > * to it unhashed and negative. If that happens, try to > > * lookup a new hashed and positive dentry. > > */ > > - d = lookup_one_len(dentry->d_name.name, dentry->d_parent, > > - dentry->d_name.len); > > + d = lookup_one_len_mapped(dentry->d_name.name, dentry->d_parent, > > + dentry->d_name.len, user_ns); > > if (IS_ERR(d)) { > > pr_warn("failed lookup after mkdir (%pd2, err=%i).\n", > > dentry, err); > > @@ -168,7 +172,8 @@ static int ovl_mkdir_real(struct inode *dir, struct dentry **newdentry, > > return 0; > > } > > > > -struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry, > > +struct dentry *ovl_create_real(struct user_namespace *user_ns, > > + struct inode *dir, struct dentry *newdentry, > > struct ovl_cattr *attr) > > { > > int err; > > @@ -181,28 +186,28 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry, > > goto out; > > > > if (attr->hardlink) { > > - err = ovl_do_link(attr->hardlink, dir, newdentry); > > + err = ovl_do_link(user_ns, attr->hardlink, dir, newdentry); > > } else { > > switch (attr->mode & S_IFMT) { > > case S_IFREG: > > - err = ovl_do_create(dir, newdentry, attr->mode); > > + err = ovl_do_create(user_ns, dir, newdentry, attr->mode); > > break; > > > > case S_IFDIR: > > /* mkdir is special... */ > > - err = ovl_mkdir_real(dir, &newdentry, attr->mode); > > + err = ovl_mkdir_real(user_ns, dir, &newdentry, attr->mode); > > break; > > > > case S_IFCHR: > > case S_IFBLK: > > case S_IFIFO: > > case S_IFSOCK: > > - err = ovl_do_mknod(dir, newdentry, attr->mode, > > + err = ovl_do_mknod(user_ns, dir, newdentry, attr->mode, > > attr->rdev); > > break; > > > > case S_IFLNK: > > - err = ovl_do_symlink(dir, newdentry, attr->link); > > + err = ovl_do_symlink(user_ns, dir, newdentry, attr->link); > > break; > > > > default: > > @@ -224,10 +229,11 @@ struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry, > > return newdentry; > > } > > > > ovl_fs to all those create helpers > > > -struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr) > > +struct dentry *ovl_create_temp(struct user_namespace *user_ns, struct dentry *workdir, > > + struct ovl_cattr *attr) > > Not only should this get ovl_fs, but workdir could be later taken from > ofs->workdir. > It's subtle because there is one caller ovl_create_index() that needs special > care so leave that cleanup to me. > > Anyway, I think you get the idea, but I am jumping ahead of myself. > Let's wait and see how the vfs patches play out. > > One naive question: > > If we want to avoid all the vfs API churn we can store the mnt_user_ns > in current cred. > Overlayfs can prepare at mount time one cred copy per layer and in > ovl_override_creds() > we can pass the ovl_layer id. > In some functions, such as ovl_looup() that access several layers, we > will need to take > care of overriding different creds per layer. > > The concerns raised about storing userns in current wrt io_uring are > not relevant > for overlayfs access to underlying layers. > > I know you and others have listed some use cases that are not > involving overlayfs, > but perhaps as a first step, this approach will be a much easier sell > and can cover > some of the use cases? The vfs approach was always going to raise eye-brows. That doesn't mean it's not the correct way to do it. Any larger change to the vfs would have and did have the same effect. I think that we should go for a proper generic solution instead of just for one filesystem. Overlayfs is a use-case that we're keen to handle as part of the large approach to this. Christian