On Fri, Apr 15, 2022 at 03:14:53AM +0000, xuyang2018.jy@xxxxxxxxxxx wrote: > on 2022/4/14 20:45, Christian Brauner wrote: > > On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote: > >> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner() > >> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner > >> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect > >> S_ISGID clear especially we filter S_IXGRP by umask or acl. > >> > >> Regardless of which filesystem is in use, failure to strip the SGID correctly is > >> considered a security failure that needs to be fixed. The current VFS infrastructure > >> requires the filesystem to do everything right and not step on any landmines to > >> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems > >> then don't even need to be aware that the SGID needs to be (or has been stripped) by > >> the operation the user asked to be done. > >> > >> Vfs has all the info it needs - it doesn't need the filesystems to do everything > >> correctly with the mode and ensuring that they order things like posix acl setup > >> functions correctly with inode_init_owner() to strip the SGID bit. > >> > >> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong. > >> > >> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because > >> this api may change mode. > >> > >> Only the following places use inode_init_owner > >> "hugetlbfs/inode.c:846: inode_init_owner(&init_user_ns, inode, dir, mode); > >> nilfs2/inode.c:354: inode_init_owner(&init_user_ns, inode, dir, mode); > >> zonefs/super.c:1289: inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555); > >> reiserfs/namei.c:619: inode_init_owner(&init_user_ns, inode, dir, mode); > >> jfs/jfs_inode.c:67: inode_init_owner(&init_user_ns, inode, parent, mode); > >> f2fs/namei.c:50: inode_init_owner(mnt_userns, inode, dir, mode); > >> ext2/ialloc.c:549: inode_init_owner(&init_user_ns, inode, dir, mode); > >> overlayfs/dir.c:643: inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode); > >> ufs/ialloc.c:292: inode_init_owner(&init_user_ns, inode, dir, mode); > >> ntfs3/inode.c:1283: inode_init_owner(mnt_userns, inode, dir, mode); > >> ramfs/inode.c:64: inode_init_owner(&init_user_ns, inode, dir, mode); > >> 9p/vfs_inode.c:263: inode_init_owner(&init_user_ns, inode, NULL, mode); > >> btrfs/tests/btrfs-tests.c:65: inode_init_owner(&init_user_ns, inode, NULL, S_IFREG); > >> btrfs/inode.c:6215: inode_init_owner(mnt_userns, inode, dir, mode); > >> sysv/ialloc.c:166: inode_init_owner(&init_user_ns, inode, dir, mode); > >> omfs/inode.c:51: inode_init_owner(&init_user_ns, inode, NULL, mode); > >> ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode); > >> udf/ialloc.c:108: inode_init_owner(&init_user_ns, inode, dir, mode); > >> ext4/ialloc.c:979: inode_init_owner(mnt_userns, inode, dir, mode); > >> hfsplus/inode.c:393: inode_init_owner(&init_user_ns, inode, dir, mode); > >> xfs/xfs_inode.c:840: inode_init_owner(mnt_userns, inode, dir, mode); > >> ocfs2/dlmfs/dlmfs.c:331: inode_init_owner(&init_user_ns, inode, NULL, mode); > >> ocfs2/dlmfs/dlmfs.c:354: inode_init_owner(&init_user_ns, inode, parent, mode); > >> ocfs2/namei.c:200: inode_init_owner(&init_user_ns, inode, dir, mode); > >> minix/bitmap.c:255: inode_init_owner(&init_user_ns, inode, dir, mode); > >> bfs/dir.c:99: inode_init_owner(&init_user_ns, inode, dir, mode); > >> " > > > > For completeness sake, there's also spufs which doesn't really go > > through the regular VFS callpath because it has separate system calls > > like: > > > > SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags, > > umode_t, mode, int, neighbor_fd) > > > > but looking through the code it only allows the creation of directories and only > > allows bits in 0777. > IMO, this fs also doesn't use inode_init_owner, so it should be not > affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid > strip situation and doesn't happen to remove old sgid strip situation. > So I think it is "safe". It does. The callchain spu_create() with SP_CREATE_GANG ends up in spufs_mkgang() which calls inode_init_owner(). But as I said it's not a problem since this only creates directories anyway.