Re: Apparent regression in blktests since 5.18-rc1+

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2022/5/7 9:55, Yanjun Zhu 写道:

在 2022/5/7 9:29, Jason Gunthorpe 写道:
On Sat, May 07, 2022 at 08:29:31AM +0800, Yanjun Zhu wrote:

If I try to run the SRP test 002 with the soft-RoCE driver, the
following appears:

[  749.901966] ================================
[  749.903638] WARNING: inconsistent lock state
[  749.905376] 5.18.0-rc5-dbg+ #1 Not tainted
[  749.907039] --------------------------------
[  749.908699] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  749.910646] ksoftirqd/5/40 [HC0[0]:SC1[1]:HE0:SE0] takes:
[  749.912499] ffff88818244d350 (&xa->xa_lock#14){+.?.}-{2:2}, at:
rxe_pool_get_index+0x73/0x170 [rdma_rxe]
[  749.914691] {SOFTIRQ-ON-W} state was registered at:
[  749.916648]   __lock_acquire+0x45b/0xce0
[  749.918599]   lock_acquire+0x18a/0x450
[  749.920480]   _raw_spin_lock+0x34/0x50
[  749.922580]   __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
[  749.924583]   rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
[  749.926394]   __ib_alloc_pd+0xa3/0x270 [ib_core]
[  749.928579]   ib_mad_port_open+0x44a/0x790 [ib_core]
[  749.930640]   ib_mad_init_device+0x8e/0x110 [ib_core]
[  749.932495]   add_client_context+0x26a/0x330 [ib_core]
[  749.934302]   enable_device_and_get+0x169/0x2b0 [ib_core]
[  749.936217]   ib_register_device+0x26f/0x330 [ib_core]
[  749.938020]   rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
[  749.939794]   rxe_add+0x8c/0xc0 [rdma_rxe]
[  749.941552]   rxe_net_add+0x5b/0x90 [rdma_rxe]
[  749.943356]   rxe_newlink+0x71/0x80 [rdma_rxe]
[  749.945182]   nldev_newlink+0x21e/0x370 [ib_core]
[  749.946917]   rdma_nl_rcv_msg+0x200/0x410 [ib_core]
[  749.948657]   rdma_nl_rcv+0x140/0x220 [ib_core]
[  749.950373]   netlink_unicast+0x307/0x460
[  749.952063]   netlink_sendmsg+0x422/0x750
[  749.953672]   __sys_sendto+0x1c2/0x250
[  749.955281]   __x64_sys_sendto+0x7f/0x90
[  749.956849]   do_syscall_64+0x35/0x80
[  749.958353]   entry_SYSCALL_64_after_hwframe+0x44/0xae
[  749.959942] irq event stamp: 1411849
[  749.961517] hardirqs last  enabled at (1411848): [<ffffffff810cdb28>]
__local_bh_enable_ip+0x88/0xf0
[  749.963338] hardirqs last disabled at (1411849): [<ffffffff81ebf24d>]
_raw_spin_lock_irqsave+0x5d/0x60
[  749.965214] softirqs last  enabled at (1411838): [<ffffffff82200467>]
__do_softirq+0x467/0x6e1
[  749.967027] softirqs last disabled at (1411843): [<ffffffff810cd947>]
run_ksoftirqd+0x37/0x60
To this, Please use this patch series
news://nntp.lore.kernel.org:119/20220422194416.983549-1-yanjun.zhu@xxxxxxxxx
No, that is the wrong fix for this. This is mismatched lock modes with
the lookup path in the BH, the fix is to consistently use BH locking
with the xarray everwhere or to use RCU. I'm expecting to go with
Bob's RCU patch.

Bob's RCU patch causes some atomic problems. Not sure these problems can be fixed properly.

I delved into Bob's rcu patch series, in this https://patchwork.kernel.org/project/linux-rdma/patch/20220421014042.26985-9-rpearsonhpe@xxxxxxxxx/,

Sometimes __rxe_cleanup is called between spin_lock_irq and spin_unlock_irq.

With Bob's rcu patch, this will cause oop.

Best Regards,

Zhu Yanjun


Zhu Yanjun


We still need a proper patch for the AH problem.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux