On Sat, 27 Apr 2024 at 16:13, Hillf Danton <hdanton@xxxxxxxx> wrote: > > > -> #0 (&sighand->siglock){....}-{2:2}: > > check_prev_add kernel/locking/lockdep.c:3134 [inline] > > check_prevs_add kernel/locking/lockdep.c:3253 [inline] > > validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869 > > __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137 > > lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754 > > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] > > _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 > > force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334 > > force_sig_fault_to_task kernel/signal.c:1733 [inline] > > force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738 > > __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814 > > handle_page_fault arch/x86/mm/fault.c:1505 [inline] > > Given page fault with runqueue locked, bpf makes trouble instead of > helping anything in this case. That's not the odd thing here. Look, the callchain is: > > exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563 > > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623 > > rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:48 > > copy_user_generic arch/x86/include/asm/uaccess_64.h:110 [inline] > > raw_copy_from_user arch/x86/include/asm/uaccess_64.h:125 [inline] > > __copy_from_user_inatomic include/linux/uaccess.h:87 [inline] > > copy_from_user_nofault+0xbc/0x150 mm/maccess.c:125 IOW, this is all doing a copy from user with page faults disabled, and it shouldn't have caused a signal to be sent, so the whole __bad_area_nosemaphore -> force_sig_fault path is bad. The *problem* here is that the page fault doesn't actually happen on a user access, it happens on the *ret* instruction in rep_movs_alternative itself (which doesn't have a exception fixup, obviously, because no exception is supposed to happen there!): RIP: 0010:rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:50 Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 83 f9 40 73 40 83 f9 08 73 21 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 <c3> cc cc cc cc 66 0f 1f 84 00 00 0$ RSP: 0000:ffffc90004137468 EFLAGS: 00050002 RAX: ffffffff8205ce4e RBX: dffffc0000000000 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000900 RDI: ffffc900041374e8 RBP: ffff88802d039784 R08: 0000000000000005 R09: ffffffff8205ce37 R10: 0000000000000003 R11: ffff88802d038000 R12: 1ffff11005a072f0 R13: 0000000000000900 R14: 0000000000000002 R15: ffffc900041374e8 where decoding that "Code:" line gives this: 0: f3 0f 1e fa endbr64 4: 48 83 f9 40 cmp $0x40,%rcx 8: 73 40 jae 0x4a a: 83 f9 08 cmp $0x8,%ecx d: 73 21 jae 0x30 f: 85 c9 test %ecx,%ecx 11: 74 0f je 0x22 13: 8a 06 mov (%rsi),%al 15: 88 07 mov %al,(%rdi) 17: 48 ff c7 inc %rdi 1a: 48 ff c6 inc %rsi 1d: 48 ff c9 dec %rcx 20: 75 f1 jne 0x13 22:* c3 ret <-- trapping instruction but I have no idea why the 'ret' instruction would take a page fault. It really shouldn't. Now, it's not like 'ret' instructions can't take page faults, but it sure shouldn't happen in the *kernel*. The reasons for page faults on 'ret' instructions are: - the instruction itself takes a page fault - the stack pointer is bogus - possibly because the stack *contents* are bogus (at least some x86 instructions that jump will check the destination in the jump instruction itself, although I didn't think 'ret' was one of them) but for the kernel, none of these actually seem to be the case normally. And even abnormally I don't see this being an issue, since the exception backtrace is happily shown (ie the stack looks all good). So this dump is just *WEIRD*. End result: the problem is not about any kind of deadlock on circular locking. That's just the symptom of that odd page fault that shouldn't have happened, and that I don't quite see how it happened. Linus