Hello Leon, > > Please let me know if I understand this correctly or incorrectly? > > The thing is that down_write() is called when we unregistering module > which sent netlink messages. It shouldn't happen. > I acknowledge that this is a low-probability event. However, the race condition still exists; otherwise, these read and write semaphores would not be necessary. Why not just remove all of them? Moreover, I find that even without the deadlock, this reentrant message would hang the kernel and cannot be killed, with logs like below: (after disabling locking sanitizer, tested in latest ubuntu) [2187983.899998] INFO: task poc.elf:1717021 blocked for more than 122 seconds. [2187983.900049] Not tainted 6.8.0-49-generic #49~22.04.1-Ubuntu [2187983.900057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [2187983.900063] task:poc.elf state:D stack:0 pid:1717021 tgid:1717021 ppid:1716834 flags:0x00004006 [2187983.900087] Call Trace: [2187983.900094] <TASK> [2187983.900355] __schedule+0x27c/0x6a0 [2187983.900430] schedule+0x33/0x110 [2187983.900442] schedule_preempt_disabled+0x15/0x30 [2187983.900454] __mutex_lock.constprop.0+0x3f8/0x7a0 [2187983.900476] __mutex_lock_slowpath+0x13/0x20 [2187983.900486] mutex_lock+0x3c/0x50 [2187983.900493] __netlink_dump_start+0x76/0x2a0 [2187983.900552] rdma_nl_rcv_msg+0x24c/0x310 [ib_core] [2187983.900673] ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm] [2187983.900699] rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core] [2187983.900802] rdma_nl_rcv+0xe/0x20 [ib_core] [2187983.900898] netlink_unicast+0x1b0/0x2a0 [2187983.900911] rdma_nl_unicast+0x49/0x70 [ib_core] [2187983.901005] iwpm_send_hello+0xfd/0x150 [iw_cm] [2187983.901030] iwpm_hello_cb+0xb9/0x130 [iw_cm] [2187983.901052] netlink_dump+0x1c0/0x340 [2187983.901065] __netlink_dump_start+0x1ef/0x2a0 [2187983.901077] rdma_nl_rcv_msg+0x24c/0x310 [ib_core] [2187983.901219] ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm] [2187983.901245] rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core] [2187983.901344] rdma_nl_rcv+0xe/0x20 [ib_core] [2187983.901437] netlink_unicast+0x1b0/0x2a0 [2187983.901449] rdma_nl_unicast+0x49/0x70 [ib_core] [2187983.901544] iwpm_send_hello+0xfd/0x150 [iw_cm] [2187983.901567] iwpm_hello_cb+0xb9/0x130 [iw_cm] [2187983.901589] netlink_dump+0x1c0/0x340 [2187983.901602] __netlink_dump_start+0x1ef/0x2a0 [2187983.901613] rdma_nl_rcv_msg+0x24c/0x310 [ib_core] [2187983.901707] ? __pfx_iwpm_hello_cb+0x10/0x10 [iw_cm] [2187983.901731] rdma_nl_rcv_skb.constprop.0.isra.0+0xbb/0x120 [ib_core] [2187983.901830] rdma_nl_rcv+0xe/0x20 [ib_core] [2187983.901922] netlink_unicast+0x1b0/0x2a0 [2187983.901933] netlink_sendmsg+0x214/0x470 [2187983.901946] __sys_sendto+0x21b/0x230 [2187983.901992] __x64_sys_sendto+0x24/0x40 [2187983.902002] x64_sys_call+0x1fc0/0x24b0 [2187983.902023] do_syscall_64+0x81/0x170 [2187983.902059] ? security_file_alloc+0x5f/0xf0 [2187983.902079] ? alloc_empty_file+0x85/0x130 [2187983.902140] ? alloc_file+0x9b/0x170 [2187983.902150] ? alloc_file_pseudo+0x9e/0x100 [2187983.902163] ? restore_fpregs_from_fpstate+0x3d/0xd0 [2187983.902197] ? switch_fpu_return+0x55/0xf0 [2187983.902208] ? syscall_exit_to_user_mode+0x83/0x260 [2187983.902229] ? do_syscall_64+0x8d/0x170 [2187983.902240] ? irqentry_exit+0x43/0x50 [2187983.902249] ? clear_bhb_loop+0x15/0x70 [2187983.902293] ? clear_bhb_loop+0x15/0x70 [2187983.902302] ? clear_bhb_loop+0x15/0x70 [2187983.902311] entry_SYSCALL_64_after_hwframe+0x78/0x80 [2187983.902319] RIP: 0033:0x440624 [2187983.902582] RSP: 002b:00007ffcfa4b29f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [2187983.902592] RAX: ffffffffffffffda RBX: 0000000000400400 RCX: 0000000000440624 [2187983.902598] RDX: 0000000000000018 RSI: 00007ffcfa4b2a30 RDI: 0000000000000003 [2187983.902604] RBP: 00007ffcfa4b3a40 R08: 000000000047df08 R09: 000000000000000c [2187983.902609] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000403990 [2187983.902614] R13: 0000000000000000 R14: 00000000006a6018 R15: 0000000000000000 That's why I'm quite sure this is a bug and requires fixing. Thanks Lin