on 2022/4/15 22:02, Christian Brauner wrote: > On Fri, Apr 15, 2022 at 03:14:53AM +0000, xuyang2018.jy@xxxxxxxxxxx wrote: >> on 2022/4/14 20:45, Christian Brauner wrote: >>> On Thu, Apr 14, 2022 at 03:57:18PM +0800, Yang Xu wrote: >>>> Currently, vfs only passes mode argument to filesystem, then use inode_init_owner() >>>> to strip S_ISGID. Some filesystem(ie ext4/btrfs) will call inode_init_owner >>>> firstly, then posxi acl setup, but xfs uses the contrary order. It will affect >>>> S_ISGID clear especially we filter S_IXGRP by umask or acl. >>>> >>>> Regardless of which filesystem is in use, failure to strip the SGID correctly is >>>> considered a security failure that needs to be fixed. The current VFS infrastructure >>>> requires the filesystem to do everything right and not step on any landmines to >>>> strip the SGID bit, when in fact it can easily be done at the VFS and the filesystems >>>> then don't even need to be aware that the SGID needs to be (or has been stripped) by >>>> the operation the user asked to be done. >>>> >>>> Vfs has all the info it needs - it doesn't need the filesystems to do everything >>>> correctly with the mode and ensuring that they order things like posix acl setup >>>> functions correctly with inode_init_owner() to strip the SGID bit. >>>> >>>> Just strip the SGID bit at the VFS, and then the filesystems can't get it wrong. >>>> >>>> Also, the inode_sgid_strip() api should be used before IS_POSIXACL() because >>>> this api may change mode. >>>> >>>> Only the following places use inode_init_owner >>>> "hugetlbfs/inode.c:846: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> nilfs2/inode.c:354: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> zonefs/super.c:1289: inode_init_owner(&init_user_ns, inode, parent, S_IFDIR | 0555); >>>> reiserfs/namei.c:619: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> jfs/jfs_inode.c:67: inode_init_owner(&init_user_ns, inode, parent, mode); >>>> f2fs/namei.c:50: inode_init_owner(mnt_userns, inode, dir, mode); >>>> ext2/ialloc.c:549: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> overlayfs/dir.c:643: inode_init_owner(&init_user_ns, inode, dentry->d_parent->d_inode, mode); >>>> ufs/ialloc.c:292: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> ntfs3/inode.c:1283: inode_init_owner(mnt_userns, inode, dir, mode); >>>> ramfs/inode.c:64: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> 9p/vfs_inode.c:263: inode_init_owner(&init_user_ns, inode, NULL, mode); >>>> btrfs/tests/btrfs-tests.c:65: inode_init_owner(&init_user_ns, inode, NULL, S_IFREG); >>>> btrfs/inode.c:6215: inode_init_owner(mnt_userns, inode, dir, mode); >>>> sysv/ialloc.c:166: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> omfs/inode.c:51: inode_init_owner(&init_user_ns, inode, NULL, mode); >>>> ubifs/dir.c:97: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> udf/ialloc.c:108: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> ext4/ialloc.c:979: inode_init_owner(mnt_userns, inode, dir, mode); >>>> hfsplus/inode.c:393: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> xfs/xfs_inode.c:840: inode_init_owner(mnt_userns, inode, dir, mode); >>>> ocfs2/dlmfs/dlmfs.c:331: inode_init_owner(&init_user_ns, inode, NULL, mode); >>>> ocfs2/dlmfs/dlmfs.c:354: inode_init_owner(&init_user_ns, inode, parent, mode); >>>> ocfs2/namei.c:200: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> minix/bitmap.c:255: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> bfs/dir.c:99: inode_init_owner(&init_user_ns, inode, dir, mode); >>>> " >>> >>> For completeness sake, there's also spufs which doesn't really go >>> through the regular VFS callpath because it has separate system calls >>> like: >>> >>> SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags, >>> umode_t, mode, int, neighbor_fd) >>> >>> but looking through the code it only allows the creation of directories and only >>> allows bits in 0777. >> IMO, this fs also doesn't use inode_init_owner, so it should be not >> affected. We add indo_sgid_strip into vfs, IMO, it only happen new sgid >> strip situation and doesn't happen to remove old sgid strip situation. >> So I think it is "safe". > > It does. The callchain spu_create() with SP_CREATE_GANG ends up in > spufs_mkgang() which calls inode_init_owner(). But as I said it's not a > problem since this only creates directories anyway. Sorry, I miss this message before. Oh, Yes, I only search inode_init_owner in linux/fs directory instead of the whole linux source directory. It seems I also miss bpf and shmem. kernel/bpf/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode); mm/shmem.c: inode_init_owner(&init_user_ns, inode, dir, mode); arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR); arch/powerpc/platforms/cell/spufs/inode.c: inode_init_owner(&init_user_ns, inode, dir, mode | S_IFDIR) bpf use vfs_mkobj in bpf_obj_do_pin with "S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()) mode and use bpf_mkobj_ops in bpf_iter_link_pin_kernel with S_IFREG | S_IRUSR; , so bpf is also not affected. Also shmem used standard vfs api, so it is not affected. I will add the three missing things(spufs, bpf, shmem) in my commit message. Best Regards Yang Xu >