Re: [bug report] blktests srp/002 hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/19/23 03:07, Zhu Yanjun wrote:
> 在 2023/9/19 12:14, Shinichiro Kawasaki 写道:
>> On Sep 16, 2023 / 13:59, Zhu Yanjun wrote:
>> [...]
>>> On Debian, with the latest multipathd or revert the commit 9b4b7c1f9f54
>>> ("RDMA/rxe: Add workqueue support for rxe tasks"), this problem will
>>> disappear.
>>
>> Zhu, thank you for the actions.
>>
>>> On Fedora 38, if the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support
>>> for rxe tasks") is reverted, will this problem still appear?
>>> I do not have such test environment. The commit is in the attachment,
>>> can anyone have a test? Please let us know the test result. Thanks.
>>
>> I tried the latest kernel tag v6.6-rc2 with my Fedora 38 test systems. With the
>> v6.6-rc2 kernel, I still see the hang. I repeated the blktests test case srp/002
>> 30 time or so, then the hang was recreated. Then I reverted the commit
>> 9b4b7c1f9f54 from v6.6-rc2, and the hang disappeared. I repeated the blktests
>> test case 100 times, and did not see the hang.
>>
>> I confirmed these results under two multipathd conditions: 1) with Fedora latest
>> device-mapper-multipath package v0.9.4, and 2) the latest multipath-tools v0.9.6
>> that I built from source code.
>>
>> So, when the commit gets reverted, the hang disappears as I reported for
>> v6.5-rcX kernels.
> Thanks, Shinichiro Kawasaki. Your helps are appreciated.
> 
> This problem is related with the followings:
> 
> 1). Linux distributions: Ubuntu, Debian and Fedora;
> 
> 2). multipathd;
> 
> 3). the commits 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks")
> 
> On Ubuntu, with or without the commit, this problem does not occur.
> 
> On Debian, without this commit, this problem does not occur. With this commit, this problem will occur.
> 
> On Fedora, without this commit, this problem does not occur. With this commit, this problem will occur.
> 
> The commits 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") is from Bob Pearson.
> 
> Hi, Bob, do you have any comments about this problem? It seems that this commit is not compatible with blktests.
> 
> Hi, Jason and Leon, please comment on this problem.
> 
> Thanks a lot.
> 
> Zhu Yanjun

My belief is that the issue is related to timing not the logical operation of the code.
Work queues are just kernel processes and can be scheduled (if not holding spinlocks)
while soft IRQs lock up the CPU until they exit. This can cause longer delays in responding
to ULPs. The work queue tasks for each QP are strictly single threaded which is managed by
the work queue framework the same as tasklets.

Earlier in time I have also seen the exact same hang behavior with the siw driver but not
recently. Also I have seen sensitivity to logging changes in the hang behavior. These are
indications that timing may be the cause of the issue.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux