Re: [PATCH v1 2/3] vfs: strip file's S_ISGID mode on vfs instead of on filesystem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 30, 2022 at 12:44:19PM +0200, Christian Brauner wrote:
> On Wed, Mar 30, 2022 at 09:10:59AM +1100, Dave Chinner wrote:
> > On Tue, Mar 29, 2022 at 07:12:11AM -0400, Jeff Layton wrote:
> > > On Mon, 2022-03-28 at 17:56 +0800, Yang Xu wrote:
> > > > Currently, vfs only passes mode argument to filesystem, then use inode_init_owner()
> > > > to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner
> > > > firstly, then posxi acl setup, but xfs uses the contrary order. It will affect
> > > > S_ISGID clear especially umask with S_IXGRP.
> > > > 
> > > > Vfs has all the info it needs - it doesn't need the filesystems to do everything
> > > > correctly with the mode and ensuring that they order things like posix acl setup
> > > > functions correctly with inode_init_owner() to strip the SGID bit.
> > > > 
> > > > Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong.
> > > > 
> > > > Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because
> > > > this api may change mode by using umask but S_ISGID clear isn't related to
> > > > SB_POSIXACL flag.
> > > > 
> > > > Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx>
> > > > Signed-off-by: Yang Xu <xuyang2018.jy@xxxxxxxxxxx>
> > > > ---
> > > >  fs/inode.c | 4 ----
> > > >  fs/namei.c | 7 +++++--
> > > >  2 files changed, 5 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/fs/inode.c b/fs/inode.c
> > > > index 1f964e7f9698..a2dd71c2437e 100644
> > > > --- a/fs/inode.c
> > > > +++ b/fs/inode.c
> > > > @@ -2246,10 +2246,6 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode,
> > > >  		/* Directories are special, and always inherit S_ISGID */
> > > >  		if (S_ISDIR(mode))
> > > >  			mode |= S_ISGID;
> > > > -		else if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP) &&
> > > > -			 !in_group_p(i_gid_into_mnt(mnt_userns, dir)) &&
> > > > -			 !capable_wrt_inode_uidgid(mnt_userns, dir, CAP_FSETID))
> > > > -			mode &= ~S_ISGID;
> > > >  	} else
> > > >  		inode_fsgid_set(inode, mnt_userns);
> > > >  	inode->i_mode = mode;
> > > > diff --git a/fs/namei.c b/fs/namei.c
> > > > index 3f1829b3ab5b..e68a99e0ac96 100644
> > > > --- a/fs/namei.c
> > > > +++ b/fs/namei.c
> > > > @@ -3287,6 +3287,7 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
> > > >  	if (open_flag & O_CREAT) {
> > > >  		if (open_flag & O_EXCL)
> > > >  			open_flag &= ~O_TRUNC;
> > > > +		inode_sgid_strip(mnt_userns, dir->d_inode, &mode);
> > > >  		if (!IS_POSIXACL(dir->d_inode))
> > > >  			mode &= ~current_umask();
> > > >  		if (likely(got_write))
> > > > @@ -3521,6 +3522,8 @@ struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns,
> > > >  	child = d_alloc(dentry, &slash_name);
> > > >  	if (unlikely(!child))
> > > >  		goto out_err;
> > > > +	inode_sgid_strip(mnt_userns, dir, &mode);
> > > > +
> > > >  	error = dir->i_op->tmpfile(mnt_userns, dir, child, mode);
> > > >  	if (error)
> > > >  		goto out_err;
> > > > @@ -3849,14 +3852,14 @@ static int do_mknodat(int dfd, struct filename *name, umode_t mode,
> > > >  	error = PTR_ERR(dentry);
> > > >  	if (IS_ERR(dentry))
> > > >  		goto out1;
> > > > -
> > > > +	mnt_userns = mnt_user_ns(path.mnt);
> > > > +	inode_sgid_strip(mnt_userns, path.dentry->d_inode, &mode);
> > > >  	if (!IS_POSIXACL(path.dentry->d_inode))
> > > >  		mode &= ~current_umask();
> > > >  	error = security_path_mknod(&path, dentry, mode, dev);
> > > >  	if (error)
> > > >  		goto out2;
> > > >  
> > > > -	mnt_userns = mnt_user_ns(path.mnt);
> > > >  	switch (mode & S_IFMT) {
> > > >  		case 0: case S_IFREG:
> > > >  			error = vfs_create(mnt_userns, path.dentry->d_inode,
> > > 
> > > I haven't gone over this in detail, but have you tested this with NFS at
> > > all?
> > > 
> > > IIRC, NFS has to leave setuid/gid stripping to the server, so I wonder
> > > if this may end up running afoul of that by forcing the client to try
> > > and strip these bits.
> > 
> > All it means is that the mode passed to the NFS server for the
> > create already has the SGID bit stripped from it. It means the
> > client is no longer reliant on the server behaving correctly to
> > close this security hole.
> > 
> > That is, failing to strip the SGID bit appropriately in the local
> > context is a security issue. Hence local machine security requires
> > that the NFS client should try to strip the SGID to defend against
> > buggy/unfixed servers that fail to strip it appropriately and
> > thereby continute to expose the local machine to this SGID security
> > issue.
> > 
> > That's the problem here - the SGID stripping in inode_init_owner()
> > is not documented, wasn't reviewed, doesn't work correctly
> > across all filesystems and leaves nasty security landmines when the VFS
> > create mode and the stripped inode mode differ.
> > 
> > Various filesystems have workarounds, partial fixes or no fixes for
> > these issues and landmines. Hence we have a situation where we are
> > playing whack-a-mole to discover and slap band-aids over all the
> > places that inode_init_owner() based stripping does not work
> > correctly.
> > 
> > In XFS, this meant the problem was not orginally fixed by the
> > silent, unreviewed change to inode_init_owner() in 2018
> > because it didn't call inode_init_owner() at all. So 4 years after
> > the bug was "fixed" and the CVE released, we are still exposed to
> > the bug because *no filesystem people knew about it* and *nobody wrote a
> > regression test* to check that the probelm was fixed and stayed
> > fixed.
> > 
> > And now that XFS does call inode_init_owner(), we've subsequently
> > discovered that XFS still fail when default acls are enabled because
> > we create the ACL from the mode passed from the VFS, not the
> > stripped mode that results from inode_init_owner() being called.
> > 
> > See what I mean about landmines?
> > 
> > The fact is this: regardless of which filesystem is in use, failure
> > to strip the SGID correctly is considered a security failure that
> > needs to be fixed. The current VFS infrastructure requires the
> > filesystem to do everything right and not step on any landmines to
> > strip the SGID bit, when in fact it can easily be done at the VFS
> > and the filesystems then don't even need to be aware that the SGID
> > needs to be (or has been stripped) by the operation the user asked
> > to be done.
> > 
> > We need the architecture to be *secure by design*, not tacked onto
> > the side like it is now.  We need to stop trying to dance around
> > these landmines - it is *not working* and we are blowing our own
> > feet off repeatedly. This hurts a lot (especially in distro land)
> > so we need to take the responsibility for stripping SGID properly
> > away from the filesystems and put it where it belongs: in the VFS.
> 
> I agree. When I added tests for set*id stripping to xfstests for the
> sake of getting complete vfs coverage of idmapped mounts in generic/633
> I immediately found bugs. Once I made the testsuite useable by all
> filesystems we started seeing more.
> 
> I think we should add and use the new proposed stripping helper in the
> vfs - albeit with a slightly changed api and also use it in
> inode_init_owner(). While it is a delicate change in the worst case we
> end up removing additional privileges that's an acceptable regression
> risk to take.

And if it's not too much trouble, can we add an fstest to encode our
current expectations about how setgid inheritance works?  I would really
like to reduce the need for historic setgid behavior spelunking. ;)

--D



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux