在 2022/5/7 9:29, Jason Gunthorpe 写道:
On Sat, May 07, 2022 at 08:29:31AM +0800, Yanjun Zhu wrote:
If I try to run the SRP test 002 with the soft-RoCE driver, the
following appears:
[ 749.901966] ================================
[ 749.903638] WARNING: inconsistent lock state
[ 749.905376] 5.18.0-rc5-dbg+ #1 Not tainted
[ 749.907039] --------------------------------
[ 749.908699] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 749.910646] ksoftirqd/5/40 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 749.912499] ffff88818244d350 (&xa->xa_lock#14){+.?.}-{2:2}, at:
rxe_pool_get_index+0x73/0x170 [rdma_rxe]
[ 749.914691] {SOFTIRQ-ON-W} state was registered at:
[ 749.916648] __lock_acquire+0x45b/0xce0
[ 749.918599] lock_acquire+0x18a/0x450
[ 749.920480] _raw_spin_lock+0x34/0x50
[ 749.922580] __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
[ 749.924583] rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
[ 749.926394] __ib_alloc_pd+0xa3/0x270 [ib_core]
[ 749.928579] ib_mad_port_open+0x44a/0x790 [ib_core]
[ 749.930640] ib_mad_init_device+0x8e/0x110 [ib_core]
[ 749.932495] add_client_context+0x26a/0x330 [ib_core]
[ 749.934302] enable_device_and_get+0x169/0x2b0 [ib_core]
[ 749.936217] ib_register_device+0x26f/0x330 [ib_core]
[ 749.938020] rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
[ 749.939794] rxe_add+0x8c/0xc0 [rdma_rxe]
[ 749.941552] rxe_net_add+0x5b/0x90 [rdma_rxe]
[ 749.943356] rxe_newlink+0x71/0x80 [rdma_rxe]
[ 749.945182] nldev_newlink+0x21e/0x370 [ib_core]
[ 749.946917] rdma_nl_rcv_msg+0x200/0x410 [ib_core]
[ 749.948657] rdma_nl_rcv+0x140/0x220 [ib_core]
[ 749.950373] netlink_unicast+0x307/0x460
[ 749.952063] netlink_sendmsg+0x422/0x750
[ 749.953672] __sys_sendto+0x1c2/0x250
[ 749.955281] __x64_sys_sendto+0x7f/0x90
[ 749.956849] do_syscall_64+0x35/0x80
[ 749.958353] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 749.959942] irq event stamp: 1411849
[ 749.961517] hardirqs last enabled at (1411848): [<ffffffff810cdb28>]
__local_bh_enable_ip+0x88/0xf0
[ 749.963338] hardirqs last disabled at (1411849): [<ffffffff81ebf24d>]
_raw_spin_lock_irqsave+0x5d/0x60
[ 749.965214] softirqs last enabled at (1411838): [<ffffffff82200467>]
__do_softirq+0x467/0x6e1
[ 749.967027] softirqs last disabled at (1411843): [<ffffffff810cd947>]
run_ksoftirqd+0x37/0x60
To this, Please use this patch series
news://nntp.lore.kernel.org:119/20220422194416.983549-1-yanjun.zhu@xxxxxxxxx
No, that is the wrong fix for this. This is mismatched lock modes with
the lookup path in the BH, the fix is to consistently use BH locking
with the xarray everwhere or to use RCU. I'm expecting to go with
Bob's RCU patch.
Bob's RCU patch causes some atomic problems. Not sure these problems can
be fixed properly.
Zhu Yanjun
We still need a proper patch for the AH problem.
Jason