2015-07-08 18:37 GMT+02:00 Stephen Smalley <sds@xxxxxxxxxxxxx>: > On 07/08/2015 09:13 AM, Stephen Smalley wrote: >> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote: >>> It appears that, at some point last year, XFS made directory handling >>> changes which bring it into lockdep conflict with shmem_zero_setup(): >>> it is surprising that mmap() can clone an inode while holding mmap_sem, >>> but that has been so for many years. >>> >>> Since those few lockdep traces that I've seen all implicated selinux, >>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which >>> v3.13's commit c7277090927a ("security: shmem: implement kernel private >>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes: >>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail. >>> >>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero >>> (and MAP_SHARED|MAP_ANONYMOUS). I thought there were also drivers >>> which cloned inode in mmap(), but if so, I cannot locate them now. >> >> This causes a regression for SELinux (please, in the future, cc >> selinux list and Paul Moore on SELinux-related changes). In >> particular, this change disables SELinux checking of mprotect >> PROT_EXEC on shared anonymous mappings, so we lose the ability to >> control executable mappings. That said, we are only getting that >> check today as a side effect of our file execute check on the tmpfs >> inode, whereas it would be better (and more consistent with the >> mmap-time checks) to apply an execmem check in that case, in which >> case we wouldn't care about the inode-based check. However, I am >> unclear on how to correctly detect that situation from >> selinux_file_mprotect() -> file_map_prot_check(), because we do have a >> non-NULL vma->vm_file so we treat it as a file execute check. In >> contrast, if directly creating an anonymous shared mapping with >> PROT_EXEC via mmap(...PROT_EXEC...), selinux_mmap_file is called with >> a NULL file and therefore we end up applying an execmem check. > > Also, can you provide the lockdep traces that motivated this change? Yes, here is it: [ 28.177939] ====================================================== [ 28.177959] [ INFO: possible circular locking dependency detected ] [ 28.177980] 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 Tainted: G W [ 28.178002] ------------------------------------------------------- [ 28.178022] sshd/1764 is trying to acquire lock: [ 28.178037] (&isec->lock){+.+.+.}, at: [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.178078] but task is already holding lock: [ 28.178097] (&mm->mmap_sem){++++++}, at: [<ffffffff81216a0f>] vm_mmap_pgoff+0x8f/0xf0 [ 28.178131] which lock already depends on the new lock. [ 28.178157] the existing dependency chain (in reverse order) is: [ 28.178180] -> #2 (&mm->mmap_sem){++++++}: [ 28.178201] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.178225] [<ffffffff8122853c>] might_fault+0x8c/0xb0 [ 28.178248] [<ffffffff8129af3a>] filldir+0x9a/0x130 [ 28.178269] [<ffffffffa019cfd6>] xfs_dir2_block_getdents.isra.12+0x1a6/0x1d0 [xfs] [ 28.178330] [<ffffffffa019dae4>] xfs_readdir+0x1c4/0x360 [xfs] [ 28.178368] [<ffffffffa01a0a5b>] xfs_file_readdir+0x2b/0x30 [xfs] [ 28.178404] [<ffffffff8129ad0a>] iterate_dir+0x9a/0x140 [ 28.178425] [<ffffffff8129b241>] SyS_getdents+0x91/0x120 [ 28.178447] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 28.178471] -> #1 (&xfs_dir_ilock_class){++++.+}: [ 28.178494] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.178515] [<ffffffff8110bee7>] down_read_nested+0x57/0xa0 [ 28.178538] [<ffffffffa01b2ed1>] xfs_ilock+0x171/0x390 [xfs] [ 28.178579] [<ffffffffa01b3168>] xfs_ilock_attr_map_shared+0x38/0x50 [xfs] [ 28.178618] [<ffffffffa0145d8d>] xfs_attr_get+0xbd/0x1b0 [xfs] [ 28.178651] [<ffffffffa01c44ad>] xfs_xattr_get+0x3d/0x80 [xfs] [ 28.178688] [<ffffffff812b022f>] generic_getxattr+0x4f/0x70 [ 28.178711] [<ffffffff813b5372>] inode_doinit_with_dentry+0x172/0x6a0 [ 28.178737] [<ffffffff813b68db>] sb_finish_set_opts+0xdb/0x260 [ 28.178759] [<ffffffff813b6ff1>] selinux_set_mnt_opts+0x331/0x670 [ 28.178783] [<ffffffff813b9b47>] superblock_doinit+0x77/0xf0 [ 28.178804] [<ffffffff813b9bd0>] delayed_superblock_init+0x10/0x20 [ 28.178849] [<ffffffff8128691a>] iterate_supers+0xba/0x120 [ 28.178872] [<ffffffff813bef23>] selinux_complete_init+0x33/0x40 [ 28.178897] [<ffffffff813cf313>] security_load_policy+0x103/0x640 [ 28.178920] [<ffffffff813c0a76>] sel_write_load+0xb6/0x790 [ 28.179482] [<ffffffff812821f7>] __vfs_write+0x37/0x110 [ 28.180047] [<ffffffff81282c89>] vfs_write+0xa9/0x1c0 [ 28.180630] [<ffffffff81283a1c>] SyS_write+0x5c/0xd0 [ 28.181168] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 28.181740] -> #0 (&isec->lock){+.+.+.}: [ 28.182808] [<ffffffff81113331>] __lock_acquire+0x1b31/0x1e40 [ 28.183347] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.183897] [<ffffffff8189c10d>] mutex_lock_nested+0x7d/0x460 [ 28.184427] [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.184944] [<ffffffff813b58bc>] selinux_d_instantiate+0x1c/0x20 [ 28.185470] [<ffffffff813b07ab>] security_d_instantiate+0x1b/0x30 [ 28.185980] [<ffffffff8129e8c4>] d_instantiate+0x54/0x80 [ 28.186495] [<ffffffff81211edc>] __shmem_file_setup+0xdc/0x250 [ 28.186990] [<ffffffff812164a8>] shmem_zero_setup+0x28/0x70 [ 28.187500] [<ffffffff8123471c>] mmap_region+0x66c/0x680 [ 28.188006] [<ffffffff81234a53>] do_mmap_pgoff+0x323/0x410 [ 28.188500] [<ffffffff81216a30>] vm_mmap_pgoff+0xb0/0xf0 [ 28.189005] [<ffffffff81232bf6>] SyS_mmap_pgoff+0x116/0x2b0 [ 28.189490] [<ffffffff810232bb>] SyS_mmap+0x1b/0x30 [ 28.189975] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 28.190474] other info that might help us debug this: [ 28.191901] Chain exists of: &isec->lock --> &xfs_dir_ilock_class --> &mm->mmap_sem [ 28.193327] Possible unsafe locking scenario: [ 28.194297] CPU0 CPU1 [ 28.194774] ---- ---- [ 28.195254] lock(&mm->mmap_sem); [ 28.195709] lock(&xfs_dir_ilock_class); [ 28.196174] lock(&mm->mmap_sem); [ 28.196654] lock(&isec->lock); [ 28.197108] *** DEADLOCK *** [ 28.198451] 1 lock held by sshd/1764: [ 28.198900] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff81216a0f>] vm_mmap_pgoff+0x8f/0xf0 [ 28.199370] stack backtrace: [ 28.200276] CPU: 2 PID: 1764 Comm: sshd Tainted: G W 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 [ 28.200753] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 28.201246] 0000000000000000 00000000eda89a94 ffff8800a86a39c8 ffffffff81896375 [ 28.201771] 0000000000000000 ffffffff82a910d0 ffff8800a86a3a18 ffffffff8110fbd6 [ 28.202275] 0000000000000002 ffff8800a86a3a78 0000000000000001 ffff8800a897b008 [ 28.203099] Call Trace: [ 28.204237] [<ffffffff81896375>] dump_stack+0x4c/0x65 [ 28.205362] [<ffffffff8110fbd6>] print_circular_bug+0x206/0x280 [ 28.206502] [<ffffffff81113331>] __lock_acquire+0x1b31/0x1e40 [ 28.207650] [<ffffffff81114017>] lock_acquire+0xc7/0x2a0 [ 28.208758] [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.209902] [<ffffffff8189c10d>] mutex_lock_nested+0x7d/0x460 [ 28.211023] [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.212162] [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0 [ 28.213283] [<ffffffff81027e7d>] ? native_sched_clock+0x2d/0xa0 [ 28.214403] [<ffffffff81027ef9>] ? sched_clock+0x9/0x10 [ 28.215514] [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0 [ 28.216656] [<ffffffff813b58bc>] selinux_d_instantiate+0x1c/0x20 [ 28.217776] [<ffffffff813b07ab>] security_d_instantiate+0x1b/0x30 [ 28.218902] [<ffffffff8129e8c4>] d_instantiate+0x54/0x80 [ 28.219992] [<ffffffff81211edc>] __shmem_file_setup+0xdc/0x250 [ 28.221112] [<ffffffff812164a8>] shmem_zero_setup+0x28/0x70 [ 28.222234] [<ffffffff8123471c>] mmap_region+0x66c/0x680 [ 28.223362] [<ffffffff81234a53>] do_mmap_pgoff+0x323/0x410 [ 28.224493] [<ffffffff81216a0f>] ? vm_mmap_pgoff+0x8f/0xf0 [ 28.225643] [<ffffffff81216a30>] vm_mmap_pgoff+0xb0/0xf0 [ 28.226771] [<ffffffff81232bf6>] SyS_mmap_pgoff+0x116/0x2b0 [ 28.227900] [<ffffffff812996ce>] ? SyS_fcntl+0x5de/0x760 [ 28.229042] [<ffffffff810232bb>] SyS_mmap+0x1b/0x30 [ 28.230156] [<ffffffff818a016e>] system_call_fastpath+0x12/0x76 [ 46.520367] Adjusting tsc more than 11% (5419175 vs 7179037) Best regards, Morten > >> >>> >>> Reported-and-tested-by: Prarit Bhargava <prarit@xxxxxxxxxx> >>> Reported-by: Daniel Wagner <wagi@xxxxxxxxx> >>> Reported-by: Morten Stevens <mstevens@xxxxxxxxxxxxxxxxx> >>> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> >>> --- >>> >>> mm/shmem.c | 8 +++++++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> --- 4.1-rc7/mm/shmem.c 2015-04-26 19:16:31.352191298 -0700 >>> +++ linux/mm/shmem.c 2015-06-14 09:26:49.461120166 -0700 >>> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru >>> struct file *file; >>> loff_t size = vma->vm_end - vma->vm_start; >>> >>> - file = shmem_file_setup("dev/zero", size, vma->vm_flags); >>> + /* >>> + * Cloning a new file under mmap_sem leads to a lock ordering conflict >>> + * between XFS directory reading and selinux: since this file is only >>> + * accessible to the user through its mapping, use S_PRIVATE flag to >>> + * bypass file security, in the same way as shmem_kernel_file_setup(). >>> + */ >>> + file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE); >>> if (IS_ERR(file)) >>> return PTR_ERR(file); >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> Please read the FAQ at http://www.tux.org/lkml/ >> _______________________________________________ >> Selinux mailing list >> Selinux@xxxxxxxxxxxxx >> To unsubscribe, send email to Selinux-leave@xxxxxxxxxxxxx. >> To get help, send an email containing "help" to Selinux-request@xxxxxxxxxxxxx. >> >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>