On Sat, Jul 27, 2019 at 06:20:18AM -0500, Eric W. Biederman wrote: > > In principle I like killing FS_USERNS_MOUNT flag, but when a method > > is always either NULL or exact same function... > > Either you are being dramatic or you read the patch much too quickly. Or you've done the same to my reply. I'm not saying that in the posted form the instances are the same; I'm saying that they are all equivalent to exact same code - ns_capable(fc->user_ns, CAP_SYS_ADMIN), that is. > userns_mount_permission covers the common case of FS_USERNS_MOUNT. > Then there are the cases where you need to know how the filesystem is > going to map current into the filesystem that will be mounted. Those > are: proc_mount_permission, sysfs_mount_permission, > mqueue_mount_permission, cgroup_mount_permission, +static int proc_mount_permission(void) +{ + struct pid_namespace *ns = task_active_pid_ns(current); + return ns_capable(ns->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM; +} compare with ctx->pid_ns = get_pid_ns(task_active_pid_ns(current)); put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(ctx->pid_ns->user_ns); in proc_init_fs_context(). Notice anything? With those 'orrible API changes proc_mount_permissions() and userns_mount_permission() can actually become identical. Because the default case is to leave fc->user_ns set to current_user_ns(). cgroup case: +static int cgroup_mount_permission(void) +{ + struct cgroup_namespace *ns = current->nsproxy->cgroup_ns; + return ns_capable(ns->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM; +} with ctx->ns = current->nsproxy->cgroup_ns; ... put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(ctx->ns->user_ns); in cgroup_init_fs_context(). IOW, again your instances fold together. mqueue: ctx->ipc_ns = get_ipc_ns(current->nsproxy->ipc_ns); put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(ctx->ipc_ns->user_ns); in mqueue_init_fs_context() and +static int mqueue_mount_permission(void) +{ + struct ipc_namespace *ns = current->nsproxy->ipc_ns; + return ns_capable(ns->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM; +} Same situation. +static int sysfs_mount_permission(void) +{ + struct net *net = current->nsproxy->net_ns; + return ns_capable(net->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM; +} with kfc->ns_tag = netns = kobj_ns_grab_current(KOBJ_NS_TYPE_NET); ... if (netns) { put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(netns->user_ns); } Now, _that_ is interesting. Here the variants diverge - in case USER_NS && !NET_NS fc->user_ns is left at current_user_ns(). That, BTW, is the only case where the old checks are left in: if (!(fc->sb_flags & SB_KERNMOUNT)) { if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) return -EPERM; } in sysfs_init_fs_context() papers over that case and I'd love to get rid of that irregularity. Which, AFAICS, can be done by unconditional put_user_ns(fc->user_ns); fc->user_ns = get_user_ns(current->nsproxy->net_ns->user_ns); whatever the .config we have. On NET_NS ones it'll be identical to ->user_ns of what kobj_ns_grab_current(KOBJ_NS_TYPE_NET) returns, on !NET_NS it'll get mount_capable() do the right thing without that wart with init_fs_context() doing that extra check. > So yes I agree the function of interest is always capable in some form, > we just need the filesystem specific logic to check to see if we will > have capable over the filesystem that will be mounted. > > I don't doubt that the new mount api has added a few new complexities. So far it looks like *in this particular case* complexities would be reduced - with one exception all your ->permission() instances become identical. Moreover, even in that case we still get the right overall behaviour with the same instance... So do you have any specific objections to behaviour in vfs.git #fixes and/or to adding diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c index db81cfbab9d6..c5b3c7d4d360 100644 --- a/fs/sysfs/mount.c +++ b/fs/sysfs/mount.c @@ -57,11 +57,6 @@ static int sysfs_init_fs_context(struct fs_context *fc) struct kernfs_fs_context *kfc; struct net *netns; - if (!(fc->sb_flags & SB_KERNMOUNT)) { - if (!kobj_ns_current_may_mount(KOBJ_NS_TYPE_NET)) - return -EPERM; - } - kfc = kzalloc(sizeof(struct kernfs_fs_context), GFP_KERNEL); if (!kfc) return -ENOMEM; @@ -71,10 +66,8 @@ static int sysfs_init_fs_context(struct fs_context *fc) kfc->magic = SYSFS_MAGIC; fc->fs_private = kfc; fc->ops = &sysfs_fs_context_ops; - if (netns) { - put_user_ns(fc->user_ns); - fc->user_ns = get_user_ns(netns->user_ns); - } + put_user_ns(fc->user_ns); + fc->user_ns = get_user_ns(current->nsproxy->net_ns->user_ns); fc->global = true; return 0; } on top of that? The last one shouldn't change the behaviour, but it certainly looks like a nice wartectomy; not #fixes material, though.