On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote: > On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote: > > On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote: > > > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote: > > > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu > > > > > <roberto.sassu@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > From: Roberto Sassu <roberto.sassu@xxxxxxxxxx> > > > > > > > > > > > > Move out the mutex in the ima_iint_cache structure to a new structure > > > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of > > > > > > whether or not inode integrity metadata are stored in the inode. > > > > > > > > > > > > Introduce ima_inode_security() to simplify accessing the new structure in > > > > > > the inode security blob. > > > > > > > > > > > > Move the mutex initialization and annotation in the new function > > > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and > > > > > > ima_iint_unlock() to respectively lock and unlock the mutex. > > > > > > > > > > > > Finally, expand the critical region in process_measurement() guarded by > > > > > > iint->mutex up to where the inode was locked, use only one iint lock in > > > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and > > > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer(). > > > > > > > > > > > > Signed-off-by: Roberto Sassu <roberto.sassu@xxxxxxxxxx> > > > > > > --- > > > > > > security/integrity/ima/ima.h | 26 ++++++++--- > > > > > > security/integrity/ima/ima_api.c | 4 +- > > > > > > security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++----- > > > > > > security/integrity/ima/ima_main.c | 39 +++++++--------- > > > > > > 4 files changed, 104 insertions(+), 42 deletions(-) > > > > > > > > > > I'm not an IMA expert, but it looks reasonable to me, although > > > > > shouldn't this carry a stable CC in the patch metadata? > > > > > > > > > > Reviewed-by: Paul Moore <paul@xxxxxxxxxxxxxx> > > > > > > > > Sorry, one more thing ... did you verify this patchset resolves the > > > > syzbot problem? I saw at least one reproducer. > > > > > > Uhm, could not reproduce the deadlock with the reproducer. However, > > > without the patch I have a lockdep warning, and with I don't. > > > > > > I asked syzbot to try the patches. Let's see. > > > > I actually got a different lockdep warning: > > > > [ 904.603365] ====================================================== > > [ 904.604264] WARNING: possible circular locking dependency detected > > [ 904.605141] 6.12.0-rc2+ #20 Not tainted > > [ 904.605697] ------------------------------------------------------ > > I can reproduce by executing the syzbot reproducer and in another > terminal by logging in with SSH (not all the times). > > If I understood what the lockdep warning means, this is the scenario. > > A task accesses a seq_file which is in the IMA policy, so we enter the > critical region guarded by the iint lock. But before we get the chance > to measure the file, a second task calls remap_file_pages() on the same > seq_file. > > remap_file_pages() takes the mmap lock and, if the event matches the > IMA policy too, the second task waits for the iint lock to be released. > > Now, the first task starts to measure the seq_file and takes the > seq_file lock. I don't know if 3 processes must be involved, but I was > thinking that reading the seq_file from the first task can trigger a > page fault, which requires to take the mmap lock. > > At this point, we reach a deadlock. The first task waits for the mmap > lock to be released, and the second waits for the iint lock to be > released, which both cannot happen. + mm, mm folks (I leave the lockdep warning down for the new people included in the thread). I think I understood what the problem is. Any lock that is taken inside security_file_mmap() would cause lock inversion since: vm_mmap_pgoff(): ret = security_mmap_file(file, prot, flag); if (!ret) { if (mmap_write_lock_killable(mm)) SYSCALL_DEFINE5(remap_file_pages, ...): if (mmap_write_lock_killable(mm)) return -EINTR; [...] file = get_file(vma->vm_file); ret = security_mmap_file(vma->vm_file, prot, flags); if (ret) goto out_fput; Since I didn't see a good way to change the order of the second, I changed the order of the first, i.e. I put security_file_mmap() under the mmap lock. (I don't know if this is a good idea.) I cannot reproduce the lockdep warning anymore. @mm folks, what is your opinion? Thanks Roberto > Roberto > > > [ 904.606577] systemd/1 is trying to acquire lock: > > [ 904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0 > > [ 904.608290] > > [ 904.608290] but task is already holding lock: > > [ 904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40 > > [ 904.610429] > > [ 904.610429] which lock already depends on the new lock. > > [ 904.610429] > > [ 904.611574] > > [ 904.611574] the existing dependency chain (in reverse order) is: > > [ 904.612628] > > [ 904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}: > > [ 904.613681] __mutex_lock+0xaf/0x760 > > [ 904.614266] mutex_lock_nested+0x27/0x40 > > [ 904.614897] ima_iint_lock+0x24/0x40 > > [ 904.615490] process_measurement+0x176/0xef0 > > [ 904.616168] ima_file_mmap+0x98/0x120 > > [ 904.616767] security_mmap_file+0x408/0x560 > > [ 904.617444] __do_sys_remap_file_pages+0x2fa/0x4c0 > > [ 904.618194] __x64_sys_remap_file_pages+0x29/0x40 > > [ 904.618937] x64_sys_call+0x6e8/0x4550 > > [ 904.619546] do_syscall_64+0x71/0x180 > > [ 904.620155] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.620952] > > [ 904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}: > > [ 904.621813] __might_fault+0x6f/0xb0 > > [ 904.622400] _copy_to_iter+0x12e/0xa80 > > [ 904.623009] seq_read_iter+0x593/0x6b0 > > [ 904.623629] proc_reg_read_iter+0x31/0xe0 > > [ 904.624276] vfs_read+0x256/0x3d0 > > [ 904.624822] ksys_read+0x6d/0x160 > > [ 904.625372] __x64_sys_read+0x1d/0x30 > > [ 904.625964] x64_sys_call+0x1068/0x4550 > > [ 904.626594] do_syscall_64+0x71/0x180 > > [ 904.627188] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.627975] > > [ 904.627975] -> #0 (&p->lock){+.+.}-{4:4}: > > [ 904.628787] __lock_acquire+0x17f3/0x2320 > > [ 904.629432] lock_acquire+0xf2/0x420 > > [ 904.630013] __mutex_lock+0xaf/0x760 > > [ 904.630596] mutex_lock_nested+0x27/0x40 > > [ 904.631225] seq_read_iter+0x62/0x6b0 > > [ 904.631831] kernfs_fop_read_iter+0x1ef/0x2c0 > > [ 904.632599] __kernel_read+0x113/0x350 > > [ 904.633206] integrity_kernel_read+0x23/0x40 > > [ 904.633902] ima_calc_file_hash_tfm+0x14e/0x230 > > [ 904.634621] ima_calc_file_hash+0x97/0x250 > > [ 904.635281] ima_collect_measurement+0x4be/0x530 > > [ 904.636008] process_measurement+0x7c0/0xef0 > > [ 904.636689] ima_file_check+0x65/0x80 > > [ 904.637295] security_file_post_open+0xb1/0x1b0 > > [ 904.638008] path_openat+0x216/0x1280 > > [ 904.638605] do_filp_open+0xab/0x140 > > [ 904.639185] do_sys_openat2+0xba/0x120 > > [ 904.639805] do_sys_open+0x4c/0x80 > > [ 904.640366] __x64_sys_openat+0x23/0x30 > > [ 904.640992] x64_sys_call+0x2575/0x4550 > > [ 904.641616] do_syscall_64+0x71/0x180 > > [ 904.642207] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.643003] > > [ 904.643003] other info that might help us debug this: > > [ 904.643003] > > [ 904.644149] Chain exists of: > > [ 904.644149] &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth] > > [ 904.644149] > > [ 904.645763] Possible unsafe locking scenario: > > [ 904.645763] > > [ 904.646614] CPU0 CPU1 > > [ 904.647264] ---- ---- > > [ 904.647909] lock(&ima_iint_lock_mutex_key[depth]); > > [ 904.648617] lock(&mm->mmap_lock); > > [ 904.649479] lock(&ima_iint_lock_mutex_key[depth]); > > [ 904.650543] lock(&p->lock); > > [ 904.650974] > > [ 904.650974] *** DEADLOCK *** > > [ 904.650974] > > [ 904.651826] 1 lock held by systemd/1: > > [ 904.652376] #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40 > > [ 904.653759] > > [ 904.653759] stack backtrace: > > [ 904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20 > > [ 904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 > > [ 904.656497] Call Trace: > > [ 904.656856] <TASK> > > [ 904.657166] dump_stack_lvl+0x134/0x1a0 > > [ 904.657728] dump_stack+0x14/0x30 > > [ 904.658206] print_circular_bug+0x38d/0x450 > > [ 904.658812] check_noncircular+0xed/0x120 > > [ 904.659396] ? srso_return_thunk+0x5/0x5f > > [ 904.659972] ? srso_return_thunk+0x5/0x5f > > [ 904.660569] __lock_acquire+0x17f3/0x2320 > > [ 904.661145] lock_acquire+0xf2/0x420 > > [ 904.661664] ? seq_read_iter+0x62/0x6b0 > > [ 904.662217] ? srso_return_thunk+0x5/0x5f > > [ 904.662886] __mutex_lock+0xaf/0x760 > > [ 904.663408] ? seq_read_iter+0x62/0x6b0 > > [ 904.663961] ? seq_read_iter+0x62/0x6b0 > > [ 904.664525] ? srso_return_thunk+0x5/0x5f > > [ 904.665098] ? mark_lock+0x4e/0x750 > > [ 904.665610] ? mutex_lock_nested+0x27/0x40 > > [ 904.666194] ? find_held_lock+0x3a/0x100 > > [ 904.666770] mutex_lock_nested+0x27/0x40 > > [ 904.667337] seq_read_iter+0x62/0x6b0 > > [ 904.667869] kernfs_fop_read_iter+0x1ef/0x2c0 > > [ 904.668536] __kernel_read+0x113/0x350 > > [ 904.669079] integrity_kernel_read+0x23/0x40 > > [ 904.669698] ima_calc_file_hash_tfm+0x14e/0x230 > > [ 904.670349] ? __lock_acquire+0xc32/0x2320 > > [ 904.670937] ? srso_return_thunk+0x5/0x5f > > [ 904.671525] ? __lock_acquire+0xfbb/0x2320 > > [ 904.672113] ? srso_return_thunk+0x5/0x5f > > [ 904.672693] ? srso_return_thunk+0x5/0x5f > > [ 904.673280] ? lock_acquire+0xf2/0x420 > > [ 904.673818] ? kernfs_iop_getattr+0x4a/0xb0 > > [ 904.674424] ? srso_return_thunk+0x5/0x5f > > [ 904.674997] ? find_held_lock+0x3a/0x100 > > [ 904.675564] ? srso_return_thunk+0x5/0x5f > > [ 904.676156] ? srso_return_thunk+0x5/0x5f > > [ 904.676740] ? srso_return_thunk+0x5/0x5f > > [ 904.677322] ? local_clock_noinstr+0x9/0xb0 > > [ 904.677923] ? srso_return_thunk+0x5/0x5f > > [ 904.678502] ? srso_return_thunk+0x5/0x5f > > [ 904.679075] ? lock_release+0x4e2/0x570 > > [ 904.679639] ima_calc_file_hash+0x97/0x250 > > [ 904.680227] ima_collect_measurement+0x4be/0x530 > > [ 904.680901] ? srso_return_thunk+0x5/0x5f > > [ 904.681496] ? srso_return_thunk+0x5/0x5f > > [ 904.682070] ? __kernfs_iattrs+0x4a/0x140 > > [ 904.682658] ? srso_return_thunk+0x5/0x5f > > [ 904.683242] ? process_measurement+0x7c0/0xef0 > > [ 904.683876] ? srso_return_thunk+0x5/0x5f > > [ 904.684462] process_measurement+0x7c0/0xef0 > > [ 904.685078] ? srso_return_thunk+0x5/0x5f > > [ 904.685654] ? srso_return_thunk+0x5/0x5f > > [ 904.686228] ? _raw_spin_unlock_irqrestore+0x5d/0xd0 > > [ 904.686938] ? srso_return_thunk+0x5/0x5f > > [ 904.687523] ? srso_return_thunk+0x5/0x5f > > [ 904.688098] ? srso_return_thunk+0x5/0x5f > > [ 904.688672] ? local_clock_noinstr+0x9/0xb0 > > [ 904.689273] ? srso_return_thunk+0x5/0x5f > > [ 904.689846] ? srso_return_thunk+0x5/0x5f > > [ 904.690430] ? srso_return_thunk+0x5/0x5f > > [ 904.691005] ? srso_return_thunk+0x5/0x5f > > [ 904.691583] ? srso_return_thunk+0x5/0x5f > > [ 904.692180] ? local_clock_noinstr+0x9/0xb0 > > [ 904.692841] ? srso_return_thunk+0x5/0x5f > > [ 904.693419] ? srso_return_thunk+0x5/0x5f > > [ 904.693990] ? lock_release+0x4e2/0x570 > > [ 904.694544] ? srso_return_thunk+0x5/0x5f > > [ 904.695115] ? kernfs_put_active+0x5d/0xc0 > > [ 904.695708] ? srso_return_thunk+0x5/0x5f > > [ 904.696286] ? kernfs_fop_open+0x376/0x6b0 > > [ 904.696872] ima_file_check+0x65/0x80 > > [ 904.697409] security_file_post_open+0xb1/0x1b0 > > [ 904.698058] path_openat+0x216/0x1280 > > [ 904.698589] do_filp_open+0xab/0x140 > > [ 904.699106] ? srso_return_thunk+0x5/0x5f > > [ 904.699693] ? lock_release+0x554/0x570 > > [ 904.700264] ? srso_return_thunk+0x5/0x5f > > [ 904.700836] ? do_raw_spin_unlock+0x76/0x140 > > [ 904.701450] ? srso_return_thunk+0x5/0x5f > > [ 904.702021] ? _raw_spin_unlock+0x3f/0xa0 > > [ 904.702606] ? srso_return_thunk+0x5/0x5f > > [ 904.703178] ? alloc_fd+0x1ca/0x3b0 > > [ 904.703685] do_sys_openat2+0xba/0x120 > > [ 904.704223] ? file_free+0x8d/0x110 > > [ 904.704729] do_sys_open+0x4c/0x80 > > [ 904.705221] __x64_sys_openat+0x23/0x30 > > [ 904.705784] x64_sys_call+0x2575/0x4550 > > [ 904.706337] do_syscall_64+0x71/0x180 > > [ 904.706863] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.707587] RIP: 0033:0x7f3462123037 > > [ 904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac > > [ 904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 > > [ 904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037 > > [ 904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c > > [ 904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000 > > [ 904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8 > > [ 904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690 > > [ 904.716877] </TASK> > > > > Roberto >