On Sat, Jul 27, 2019 at 12:22:20AM +0100, Al Viro wrote: > On Fri, Jul 26, 2019 at 03:47:02PM -0700, Linus Torvalds wrote: > > > Of course, then later on, commit 20284ab7427f ("switch mount_capable() > > to fs_context") drops that argument entirely, and hardcodes the > > decision to look at fc->global. > > > > But that fc->global decision wasn't there originally, and is incorrect > > since it breaks existing users. > > > > What gets much more confusing about this is that the two different > > users then moved around. The sget_userns() case got moved to > > legacy_get_tree(), and then joined together in vfs_get_tree(), and > > then split and moved out to do_new_mount() and vfs_fsconfig_locked(). > > > > And that "joined together into vfs_get_tree()" must be wrong, because > > the two cases used two different namespace rules. The sget_userns() > > case *did* have that "global" flag check, while the sget_fc() did not. > > > > Messy. Al? > > Digging through that mess... It's my fuckup, and we obviously need to > restore the old behaviour, but I really hope to manage that with > checks _not_ in superblock allocator ;-/ It shouldn't have looked at fc->global for those checks. In any cases. sget_fc() should indeed have been passing fc->user_ns, not userns. And as for sget_userns(), by the time of 20284ab7427f its checks had been moved to legacy_get_tree(). In form of if (!mount_capable(fc->fs_type, fc->user_ns)) as it bloody well ought to. So the first mistake (wrong argument passed to mount_capable() by sget_fc() in 0ce0cf12fc4c) has been completed by 20284ab7427f - that conversion was, actually, an equivalent transformation (callers of legacy_get_tree() never have fc->global set, so it's all the same). However, the bug introduced in the earlier commit was now spelled out in mount_capable() itself. IOW, the minimal fix should be as below. In principle, I'm not against Eric's "add a method instead of setting FS_USERNS_MOUNT", but note that in *all* cases the instances of his method end up being equivalent to return ns_capable(fc->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM; Anyway, AFAICS the regression fix should be simply this: Unbreak mount_capable() In "consolidate the capability checks in sget_{fc,userns}())" the wrong argument had been passed to mount_capable() by sget_fc(). That mistake had been further obscured later, when switching mount_capable() to fs_context has moved the calculation of bogus argument from sget_fc() to mount_capable() itself. It should've been fc->user_ns all along. Screwed-up-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Reported-by: Christian Brauner <christian@xxxxxxxxxx> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> --- diff --git a/fs/super.c b/fs/super.c index 113c58f19425..5960578a4076 100644 --- a/fs/super.c +++ b/fs/super.c @@ -478,13 +478,10 @@ EXPORT_SYMBOL(generic_shutdown_super); bool mount_capable(struct fs_context *fc) { - struct user_namespace *user_ns = fc->global ? &init_user_ns - : fc->user_ns; - if (!(fc->fs_type->fs_flags & FS_USERNS_MOUNT)) return capable(CAP_SYS_ADMIN); else - return ns_capable(user_ns, CAP_SYS_ADMIN); + return ns_capable(fc->user_ns, CAP_SYS_ADMIN); } /**