Re: Apparent regression in blktests since 5.18-rc1+

Yanjun Zhu <yanjun.zhu@xxxxxxxxx> · Mon, 9 May 2022 20:31:22 +0800

在 2022/5/9 19:52, Jason Gunthorpe 写道:
On Mon, May 09, 2022 at 04:01:19PM +0800, Zhu Yanjun wrote:
I delved into the above calltrace. It is the same with the problem in
the link https://www.spinics.net/lists/linux-rdma/msg109875.html
Yes

So IMHO, the fix in this link
https://patchwork.kernel.org/project/linux-rdma/patch/20220422194416.983549-1-yanjun.zhu@xxxxxxxxx/
should fix this problem.
I'm not going to apply a hacky patch like that, it needs proper fixing.

Can you explain "a hacky patch like that"?  Thanks.

And if we want to use BH, it is very possible that the problem in the
link https://patchwork.kernel.org/project/linux-rdma/patch/20220210073655.42281-4-guoqing.jiang@xxxxxxxxx/
will occur.

And to the RCU patch series in the link
https://patchwork.kernel.org/project/linux-rdma/patch/20220421014042.26985-2-rpearsonhpe@xxxxxxxxx/
I also delved into this patch series. And I found that an atomic
problem will occur if we apply RCU patches onto V5.18-rc5.
And because of the atomic problem, I can not verify that this RCU
patches can fix this problem currently.
What is the oops?

The oops is like the following:

[   36.700281] Call Trace:

[   36.700285]  <TASK>
[   36.700291]  dump_stack_lvl+0x70/0xa0
[   36.700323]  dump_stack+0x10/0x12
[   36.700329]  __might_resched.cold+0x102/0x13a
[   36.700350]  __might_sleep+0x43/0x70
[   36.700368]  wait_for_completion_timeout+0x40/0x160
[   36.700373]  ? _raw_spin_unlock_irqrestore+0x4f/0x80
[   36.700381]  ? complete+0x4c/0x60
[   36.700403]  __rxe_cleanup+0xaf/0xc0 [rdma_rxe]
[   36.700431]  rxe_destroy_ah+0x12/0x20 [rdma_rxe]
[   36.700440]  rdma_destroy_ah_user+0x3a/0x80 [ib_core]
[   36.700464]  cm_free_priv_msg+0x44/0xf0 [ib_cm]
[   36.700477]  cm_send_handler+0x10b/0x2f0 [ib_cm]
[   36.700510]  timeout_sends+0x1aa/0x230 [ib_core]
[   36.700544]  process_one_work+0x2a9/0x5e0
[   36.700567]  worker_thread+0x4d/0x3c0
[   36.700582]  ? process_one_work+0x5e0/0x5e0
[   36.700588]  kthread+0x10a/0x130
[   36.700594]  ? kthread_complete_and_exit+0x20/0x20
[   36.700604]  ret_from_fork+0x22/0x30

[   36.700650]  </TASK>

Zhu Yanjun

Jason