Re: Regression in 5.3 for some FS_USERNS_MOUNT (aka user-namespace-mountable) filesystems

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Sat, 27 Jul 2019 03:23:45 +0100

On Sat, Jul 27, 2019 at 12:22:20AM +0100, Al Viro wrote:
> On Fri, Jul 26, 2019 at 03:47:02PM -0700, Linus Torvalds wrote:
> 
> > Of course, then later on, commit 20284ab7427f ("switch mount_capable()
> > to fs_context") drops that argument entirely, and hardcodes the
> > decision to look at fc->global.
> > 
> > But that fc->global decision wasn't there originally, and is incorrect
> > since it breaks existing users.
> > 
> > What gets much more confusing about this is that the two different
> > users then moved around. The sget_userns() case got moved to
> > legacy_get_tree(), and then joined together in vfs_get_tree(), and
> > then split and moved out to do_new_mount() and vfs_fsconfig_locked().
> > 
> > And that "joined together into vfs_get_tree()" must be wrong, because
> > the two cases used two different namespace rules. The sget_userns()
> > case *did* have that "global" flag check, while the sget_fc() did not.
> > 
> > Messy. Al?
> 
> Digging through that mess...  It's my fuckup, and we obviously need to
> restore the old behaviour, but I really hope to manage that with
> checks _not_ in superblock allocator ;-/

It shouldn't have looked at fc->global for those checks.  In any cases.
sget_fc() should indeed have been passing fc->user_ns, not userns.
And as for sget_userns(), by the time of 20284ab7427f
its checks had been moved to legacy_get_tree().  In form of
	if (!mount_capable(fc->fs_type, fc->user_ns))
as it bloody well ought to.

So the first mistake (wrong argument passed to mount_capable() by sget_fc()
in 0ce0cf12fc4c) has been completed by 20284ab7427f - that conversion was,
actually, an equivalent transformation (callers of legacy_get_tree() never
have fc->global set, so it's all the same).  However, the bug introduced in
the earlier commit was now spelled out in mount_capable() itself.

IOW, the minimal fix should be as below.  In principle, I'm not against
Eric's "add a method instead of setting FS_USERNS_MOUNT", but note that
in *all* cases the instances of his method end up being equivalent to
	return ns_capable(fc->user_ns, CAP_SYS_ADMIN) ? 0 : -EPERM;

Anyway, AFAICS the regression fix should be simply this:

Unbreak mount_capable()

In "consolidate the capability checks in sget_{fc,userns}())" the
wrong argument had been passed to mount_capable() by sget_fc().
That mistake had been further obscured later, when switching
mount_capable() to fs_context has moved the calculation of
bogus argument from sget_fc() to mount_capable() itself.  It
should've been fc->user_ns all along.

Screwed-up-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Reported-by: Christian Brauner <christian@xxxxxxxxxx>
Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
---

diff --git a/fs/super.c b/fs/super.c
index 113c58f19425..5960578a4076 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -478,13 +478,10 @@ EXPORT_SYMBOL(generic_shutdown_super);
 
 bool mount_capable(struct fs_context *fc)
 {
-	struct user_namespace *user_ns = fc->global ? &init_user_ns
-						    : fc->user_ns;
-
 	if (!(fc->fs_type->fs_flags & FS_USERNS_MOUNT))
 		return capable(CAP_SYS_ADMIN);
 	else
-		return ns_capable(user_ns, CAP_SYS_ADMIN);
+		return ns_capable(fc->user_ns, CAP_SYS_ADMIN);
 }
 
 /**