On Wed, Oct 25, 2023 at 11:42:21AM +0200, Michael Weiß wrote: > Wire up security_inode_mknod_capns() in fs/namei.c. If implemented > and access is granted by an lsm, check ns_capable() instead of the > global CAP_MKNOD. > > Wire up security_sb_alloc_userns() in fs/super.c. If implemented > and access is granted by an lsm, the created super block will allow > access to device nodes also if it was created in a non-inital userns. > > Signed-off-by: Michael Weiß <michael.weiss@xxxxxxxxxxxxxxxxxxx> > --- > fs/namei.c | 16 +++++++++++++++- > fs/super.c | 6 +++++- > 2 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index f601fcbdc4d2..1f68d160e2c0 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -3949,6 +3949,20 @@ inline struct dentry *user_path_create(int dfd, const char __user *pathname, > } > EXPORT_SYMBOL(user_path_create); > > +static bool mknod_capable(struct inode *dir, struct dentry *dentry, > + umode_t mode, dev_t dev) > +{ > + /* > + * In case of a security hook implementation check mknod in user > + * namespace. Otherwise just check global capability. > + */ > + int error = security_inode_mknod_nscap(dir, dentry, mode, dev); > + if (!error) > + return ns_capable(current_user_ns(), CAP_MKNOD); > + else > + return capable(CAP_MKNOD); > +} > + > /** > * vfs_mknod - create device node or file > * @idmap: idmap of the mount the inode was found from > @@ -3975,7 +3989,7 @@ int vfs_mknod(struct mnt_idmap *idmap, struct inode *dir, > return error; > > if ((S_ISCHR(mode) || S_ISBLK(mode)) && !is_whiteout && > - !capable(CAP_MKNOD)) > + !mknod_capable(dir, dentry, mode, dev)) > return -EPERM; > > if (!dir->i_op->mknod) > diff --git a/fs/super.c b/fs/super.c > index 2d762ce67f6e..bb01db6d9986 100644 > --- a/fs/super.c > +++ b/fs/super.c > @@ -362,7 +362,11 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, > } > s->s_bdi = &noop_backing_dev_info; > s->s_flags = flags; > - if (s->s_user_ns != &init_user_ns) > + /* > + * We still have to think about this here. Several concerns exist > + * about the security model, especially about malicious fuse. > + */ > + if (s->s_user_ns != &init_user_ns && security_sb_alloc_userns(s)) > s->s_iflags |= SB_I_NODEV; Hm, no. We dont want to have security hooks called in alloc_super(). That's just the wrong layer for this. This is deeply internal stuff where we should avoid interfacing with other subsystems. Removing SB_I_NODEV here is also problematic or at least overly broad because you allow to circumvent this for _every_ filesystems including stuff like proc and so on where that doesn't make any sense.