Hi Zhu, Bob, Steve, There seems to be a locking bug in the softRoCE driver when mounting a cifs share. See attached trace. I'm guessing the problem is that a softirq handler is accessing the xarray, but other accesses to the xarray aren't guarded by _bh or _irq markers on the lock primitives. I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not take the spinlock. Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or xa_alloc_cyclic_irq(). I used the following commands: rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=... talking to ksmbd on the other side. Kernel is v5.18-rc6. David --- infiniband rxe0: set active infiniband rxe0: added enp6s0 RDS/IB: rxe0: added CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount. CIFS: Attempting to mount \\192.168.6.1\scratch ================================ WARNING: inconsistent lock state 5.18.0-rc6-build2+ #465 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes: ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69 {SOFTIRQ-ON-W} state was registered at: mark_usage+0x169/0x17b __lock_acquire+0x50c/0x96a lock_acquire+0x2f4/0x37b _raw_spin_lock+0x2f/0x39 xa_alloc_cyclic.constprop.0+0x20/0x55 __rxe_add_to_pool+0xe3/0xf2 __ib_alloc_pd+0xa2/0x26b ib_mad_port_open+0x1ac/0x4a1 ib_mad_init_device+0x9b/0x1b9 add_client_context+0x133/0x1b3 enable_device_and_get+0x129/0x248 ib_register_device+0x256/0x2fd rxe_register_device+0x18e/0x1b7 rxe_net_add+0x57/0x71 rxe_newlink+0x71/0x8e nldev_newlink+0x200/0x26a rdma_nl_rcv_msg+0x260/0x2ab rdma_nl_rcv+0x108/0x1a7 netlink_unicast+0x1fc/0x2b3 netlink_sendmsg+0x4ce/0x51b sock_sendmsg_nosec+0x41/0x4f __sys_sendto+0x157/0x1cc __x64_sys_sendto+0x76/0x82 do_syscall_64+0x39/0x46 entry_SYSCALL_64_after_hwframe+0x44/0xae irq event stamp: 194111 hardirqs last enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51 softirqs last enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489 softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xa->xa_lock#12); <Interrupt> lock(&xa->xa_lock#12); *** DEADLOCK *** no locks held by ksoftirqd/1/20. stack backtrace: CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 valid_state+0x56/0x61 mark_lock_irq+0x9b/0x2ec ? ret_from_fork+0x1f/0x30 ? valid_state+0x61/0x61 ? stack_trace_save+0x8f/0xbe ? filter_irq_stacks+0x58/0x58 ? jhash.constprop.0+0x1ad/0x202 ? save_trace+0x17c/0x196 mark_lock.part.0+0x10c/0x164 mark_usage+0xe6/0x17b __lock_acquire+0x50c/0x96a lock_acquire+0x2f4/0x37b ? rxe_pool_get_index+0x19/0x69 ? rcu_read_unlock+0x52/0x52 ? jhash.constprop.0+0x1ad/0x202 ? lockdep_unlock+0xde/0xe6 ? validate_chain+0x44a/0x4a8 ? req_next_wqe+0x312/0x363 _raw_spin_lock_irqsave+0x41/0x51 ? rxe_pool_get_index+0x19/0x69 rxe_pool_get_index+0x19/0x69 rxe_get_av+0xbe/0x14b rxe_requester+0x6b5/0xbb0 ? rnr_nak_timer+0x16/0x16 ? lock_downgrade+0xad/0xad ? rcu_read_lock_bh_held+0xab/0xab ? __wake_up+0xf/0xf ? mark_held_locks+0x1f/0x78 ? __local_bh_enable_ip+0xb8/0xcc ? rnr_nak_timer+0x16/0x16 rxe_do_task+0xb5/0x13d ? rxe_detach_mcast+0x1d6/0x1d6 tasklet_action_common.constprop.0+0xda/0x145 __do_softirq+0x202/0x489 ? __irq_exit_rcu+0x108/0x108 ? _local_bh_enable+0x1c/0x1c run_ksoftirqd+0x31/0x56 smpboot_thread_fn+0x35c/0x376 ? sort_range+0x1c/0x1c kthread+0x164/0x173 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> CIFS: VFS: RDMA transport established