Re: [bug report] blktests srp/002 hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2023/9/14 1:36, Bob Pearson 写道:
On 8/25/23 08:52, Bart Van Assche wrote:
On 8/24/23 18:11, Shinichiro Kawasaki wrote:
If it takes time to resolve the issues, it sounds a good idea to make siw driver
default, since it will make the hangs less painful for blktests users. Another
idea to reduce the pain is to improve srp/002 and srp/011 to detect the hangs
and report them as failures.

At this moment we don't know whether the hangs can be converted into failures.
Answering this question is only possible after we have found the root cause of
the hang. If the hang would be caused by commands getting stuck in multipathd
then it can be solved by changing the path configuration (see also the dmsetup
message commands in blktests). If the hang is caused by a kernel bug then it's
very well possible that there is no way to recover other than by rebooting the
system on which the tests are run.

Thanks,

Bart.

Since 6.6.0-rc1 came out I decided to give blktests srp another try with the current
rdma for-next branch on my Ubuntu (debian) system. For the first time in a very long
time all the srp test cases run correctly multiple times. I ran each one 3X.

I had tried to build multipath-tools from source but ran into problems so I reinstalled
the current Ubuntu packages. I have no idea what was the root cause that finally went
away but I don't think it was in rxe as there aren't any recent patches related to the
blktests failures. I did notice that the dmesg traces picked up a couple of lines after
the place where it used to hang. Something about setting an ALUA timeout to 60 seconds.

Thanks to all who worked on this.

Hi, Bob

About this problem, IIRC, this problem easily occurred on Debian and Fedora 38 and with the commit 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks").

And on Debian, with the latest multipathd, this problem seems to disappear.

On Fedora 38, even with the latest multipathd, this problem still can be observed.

On Ubuntu, it is difficult to reproduce this problem.

Perhaps this is why you can not reproduce this problem on Ubuntu.

It seems that this problem is related with linux distribution and the version of multipathd.

If I am missing something, please feel free to let me know.

Zhu Yanjun


Bob Pearson




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux