Re: mm: shmem_zero_setup skip security check and lockdep conflict with XFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2015-07-08 18:37 GMT+02:00 Stephen Smalley <sds@xxxxxxxxxxxxx>:
> On 07/08/2015 09:13 AM, Stephen Smalley wrote:
>> On Sun, Jun 14, 2015 at 12:48 PM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>>> It appears that, at some point last year, XFS made directory handling
>>> changes which bring it into lockdep conflict with shmem_zero_setup():
>>> it is surprising that mmap() can clone an inode while holding mmap_sem,
>>> but that has been so for many years.
>>>
>>> Since those few lockdep traces that I've seen all implicated selinux,
>>> I'm hoping that we can use the __shmem_file_setup(,,,S_PRIVATE) which
>>> v3.13's commit c7277090927a ("security: shmem: implement kernel private
>>> shmem inodes") introduced to avoid LSM checks on kernel-internal inodes:
>>> the mmap("/dev/zero") cloned inode is indeed a kernel-internal detail.
>>>
>>> This also covers the !CONFIG_SHMEM use of ramfs to support /dev/zero
>>> (and MAP_SHARED|MAP_ANONYMOUS).  I thought there were also drivers
>>> which cloned inode in mmap(), but if so, I cannot locate them now.
>>
>> This causes a regression for SELinux (please, in the future, cc
>> selinux list and Paul Moore on SELinux-related changes).  In
>> particular, this change disables SELinux checking of mprotect
>> PROT_EXEC on shared anonymous mappings, so we lose the ability to
>> control executable mappings.  That said, we are only getting that
>> check today as a side effect of our file execute check on the tmpfs
>> inode, whereas it would be better (and more consistent with the
>> mmap-time checks) to apply an execmem check in that case, in which
>> case we wouldn't care about the inode-based check.  However, I am
>> unclear on how to correctly detect that situation from
>> selinux_file_mprotect() -> file_map_prot_check(), because we do have a
>> non-NULL vma->vm_file so we treat it as a file execute check.  In
>> contrast, if directly creating an anonymous shared mapping with
>> PROT_EXEC via mmap(...PROT_EXEC...),  selinux_mmap_file is called with
>> a NULL file and therefore we end up applying an execmem check.
>
> Also, can you provide the lockdep traces that motivated this change?

Yes, here is it:

[   28.177939] ======================================================
[   28.177959] [ INFO: possible circular locking dependency detected ]
[   28.177980] 4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1 Tainted: G        W
[   28.178002] -------------------------------------------------------
[   28.178022] sshd/1764 is trying to acquire lock:
[   28.178037]  (&isec->lock){+.+.+.}, at: [<ffffffff813b52c5>]
inode_doinit_with_dentry+0xc5/0x6a0
[   28.178078]
               but task is already holding lock:
[   28.178097]  (&mm->mmap_sem){++++++}, at: [<ffffffff81216a0f>]
vm_mmap_pgoff+0x8f/0xf0
[   28.178131]
               which lock already depends on the new lock.

[   28.178157]
               the existing dependency chain (in reverse order) is:
[   28.178180]
               -> #2 (&mm->mmap_sem){++++++}:
[   28.178201]        [<ffffffff81114017>] lock_acquire+0xc7/0x2a0
[   28.178225]        [<ffffffff8122853c>] might_fault+0x8c/0xb0
[   28.178248]        [<ffffffff8129af3a>] filldir+0x9a/0x130
[   28.178269]        [<ffffffffa019cfd6>]
xfs_dir2_block_getdents.isra.12+0x1a6/0x1d0 [xfs]
[   28.178330]        [<ffffffffa019dae4>] xfs_readdir+0x1c4/0x360 [xfs]
[   28.178368]        [<ffffffffa01a0a5b>] xfs_file_readdir+0x2b/0x30 [xfs]
[   28.178404]        [<ffffffff8129ad0a>] iterate_dir+0x9a/0x140
[   28.178425]        [<ffffffff8129b241>] SyS_getdents+0x91/0x120
[   28.178447]        [<ffffffff818a016e>] system_call_fastpath+0x12/0x76
[   28.178471]
               -> #1 (&xfs_dir_ilock_class){++++.+}:
[   28.178494]        [<ffffffff81114017>] lock_acquire+0xc7/0x2a0
[   28.178515]        [<ffffffff8110bee7>] down_read_nested+0x57/0xa0
[   28.178538]        [<ffffffffa01b2ed1>] xfs_ilock+0x171/0x390 [xfs]
[   28.178579]        [<ffffffffa01b3168>]
xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
[   28.178618]        [<ffffffffa0145d8d>] xfs_attr_get+0xbd/0x1b0 [xfs]
[   28.178651]        [<ffffffffa01c44ad>] xfs_xattr_get+0x3d/0x80 [xfs]
[   28.178688]        [<ffffffff812b022f>] generic_getxattr+0x4f/0x70
[   28.178711]        [<ffffffff813b5372>] inode_doinit_with_dentry+0x172/0x6a0
[   28.178737]        [<ffffffff813b68db>] sb_finish_set_opts+0xdb/0x260
[   28.178759]        [<ffffffff813b6ff1>] selinux_set_mnt_opts+0x331/0x670
[   28.178783]        [<ffffffff813b9b47>] superblock_doinit+0x77/0xf0
[   28.178804]        [<ffffffff813b9bd0>] delayed_superblock_init+0x10/0x20
[   28.178849]        [<ffffffff8128691a>] iterate_supers+0xba/0x120
[   28.178872]        [<ffffffff813bef23>] selinux_complete_init+0x33/0x40
[   28.178897]        [<ffffffff813cf313>] security_load_policy+0x103/0x640
[   28.178920]        [<ffffffff813c0a76>] sel_write_load+0xb6/0x790
[   28.179482]        [<ffffffff812821f7>] __vfs_write+0x37/0x110
[   28.180047]        [<ffffffff81282c89>] vfs_write+0xa9/0x1c0
[   28.180630]        [<ffffffff81283a1c>] SyS_write+0x5c/0xd0
[   28.181168]        [<ffffffff818a016e>] system_call_fastpath+0x12/0x76
[   28.181740]
               -> #0 (&isec->lock){+.+.+.}:
[   28.182808]        [<ffffffff81113331>] __lock_acquire+0x1b31/0x1e40
[   28.183347]        [<ffffffff81114017>] lock_acquire+0xc7/0x2a0
[   28.183897]        [<ffffffff8189c10d>] mutex_lock_nested+0x7d/0x460
[   28.184427]        [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0
[   28.184944]        [<ffffffff813b58bc>] selinux_d_instantiate+0x1c/0x20
[   28.185470]        [<ffffffff813b07ab>] security_d_instantiate+0x1b/0x30
[   28.185980]        [<ffffffff8129e8c4>] d_instantiate+0x54/0x80
[   28.186495]        [<ffffffff81211edc>] __shmem_file_setup+0xdc/0x250
[   28.186990]        [<ffffffff812164a8>] shmem_zero_setup+0x28/0x70
[   28.187500]        [<ffffffff8123471c>] mmap_region+0x66c/0x680
[   28.188006]        [<ffffffff81234a53>] do_mmap_pgoff+0x323/0x410
[   28.188500]        [<ffffffff81216a30>] vm_mmap_pgoff+0xb0/0xf0
[   28.189005]        [<ffffffff81232bf6>] SyS_mmap_pgoff+0x116/0x2b0
[   28.189490]        [<ffffffff810232bb>] SyS_mmap+0x1b/0x30
[   28.189975]        [<ffffffff818a016e>] system_call_fastpath+0x12/0x76
[   28.190474]
               other info that might help us debug this:

[   28.191901] Chain exists of:
                 &isec->lock --> &xfs_dir_ilock_class --> &mm->mmap_sem

[   28.193327]  Possible unsafe locking scenario:

[   28.194297]        CPU0                    CPU1
[   28.194774]        ----                    ----
[   28.195254]   lock(&mm->mmap_sem);
[   28.195709]                                lock(&xfs_dir_ilock_class);
[   28.196174]                                lock(&mm->mmap_sem);
[   28.196654]   lock(&isec->lock);
[   28.197108]
                *** DEADLOCK ***

[   28.198451] 1 lock held by sshd/1764:
[   28.198900]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81216a0f>]
vm_mmap_pgoff+0x8f/0xf0
[   28.199370]
               stack backtrace:
[   28.200276] CPU: 2 PID: 1764 Comm: sshd Tainted: G        W
4.1.0-0.rc7.git0.1.fc23.x86_64+debug #1
[   28.200753] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[   28.201246]  0000000000000000 00000000eda89a94 ffff8800a86a39c8
ffffffff81896375
[   28.201771]  0000000000000000 ffffffff82a910d0 ffff8800a86a3a18
ffffffff8110fbd6
[   28.202275]  0000000000000002 ffff8800a86a3a78 0000000000000001
ffff8800a897b008
[   28.203099] Call Trace:
[   28.204237]  [<ffffffff81896375>] dump_stack+0x4c/0x65
[   28.205362]  [<ffffffff8110fbd6>] print_circular_bug+0x206/0x280
[   28.206502]  [<ffffffff81113331>] __lock_acquire+0x1b31/0x1e40
[   28.207650]  [<ffffffff81114017>] lock_acquire+0xc7/0x2a0
[   28.208758]  [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0
[   28.209902]  [<ffffffff8189c10d>] mutex_lock_nested+0x7d/0x460
[   28.211023]  [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0
[   28.212162]  [<ffffffff813b52c5>] ? inode_doinit_with_dentry+0xc5/0x6a0
[   28.213283]  [<ffffffff81027e7d>] ? native_sched_clock+0x2d/0xa0
[   28.214403]  [<ffffffff81027ef9>] ? sched_clock+0x9/0x10
[   28.215514]  [<ffffffff813b52c5>] inode_doinit_with_dentry+0xc5/0x6a0
[   28.216656]  [<ffffffff813b58bc>] selinux_d_instantiate+0x1c/0x20
[   28.217776]  [<ffffffff813b07ab>] security_d_instantiate+0x1b/0x30
[   28.218902]  [<ffffffff8129e8c4>] d_instantiate+0x54/0x80
[   28.219992]  [<ffffffff81211edc>] __shmem_file_setup+0xdc/0x250
[   28.221112]  [<ffffffff812164a8>] shmem_zero_setup+0x28/0x70
[   28.222234]  [<ffffffff8123471c>] mmap_region+0x66c/0x680
[   28.223362]  [<ffffffff81234a53>] do_mmap_pgoff+0x323/0x410
[   28.224493]  [<ffffffff81216a0f>] ? vm_mmap_pgoff+0x8f/0xf0
[   28.225643]  [<ffffffff81216a30>] vm_mmap_pgoff+0xb0/0xf0
[   28.226771]  [<ffffffff81232bf6>] SyS_mmap_pgoff+0x116/0x2b0
[   28.227900]  [<ffffffff812996ce>] ? SyS_fcntl+0x5de/0x760
[   28.229042]  [<ffffffff810232bb>] SyS_mmap+0x1b/0x30
[   28.230156]  [<ffffffff818a016e>] system_call_fastpath+0x12/0x76
[   46.520367] Adjusting tsc more than 11% (5419175 vs 7179037)


Best regards,

Morten

>
>>
>>>
>>> Reported-and-tested-by: Prarit Bhargava <prarit@xxxxxxxxxx>
>>> Reported-by: Daniel Wagner <wagi@xxxxxxxxx>
>>> Reported-by: Morten Stevens <mstevens@xxxxxxxxxxxxxxxxx>
>>> Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
>>> ---
>>>
>>>  mm/shmem.c |    8 +++++++-
>>>  1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> --- 4.1-rc7/mm/shmem.c  2015-04-26 19:16:31.352191298 -0700
>>> +++ linux/mm/shmem.c    2015-06-14 09:26:49.461120166 -0700
>>> @@ -3401,7 +3401,13 @@ int shmem_zero_setup(struct vm_area_stru
>>>         struct file *file;
>>>         loff_t size = vma->vm_end - vma->vm_start;
>>>
>>> -       file = shmem_file_setup("dev/zero", size, vma->vm_flags);
>>> +       /*
>>> +        * Cloning a new file under mmap_sem leads to a lock ordering conflict
>>> +        * between XFS directory reading and selinux: since this file is only
>>> +        * accessible to the user through its mapping, use S_PRIVATE flag to
>>> +        * bypass file security, in the same way as shmem_kernel_file_setup().
>>> +        */
>>> +       file = __shmem_file_setup("dev/zero", size, vma->vm_flags, S_PRIVATE);
>>>         if (IS_ERR(file))
>>>                 return PTR_ERR(file);
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
>> _______________________________________________
>> Selinux mailing list
>> Selinux@xxxxxxxxxxxxx
>> To unsubscribe, send email to Selinux-leave@xxxxxxxxxxxxx.
>> To get help, send an email containing "help" to Selinux-request@xxxxxxxxxxxxx.
>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]