Re: Apparent regression in blktests since 5.18-rc1+

Thorsten Leemhuis <regressions@xxxxxxxxxxxxx> · Mon, 9 May 2022 08:56:49 +0200

[TLDR: I'm adding this regression report to the list of tracked
regressions; all text from me you find below is based on a few templates
paragraphs you might have encountered already already in similar form.]

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

CCing the regression mailing list, as it should be in the loop for all
regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

To be sure below issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced v5.17..v5.18-rc6
#regzbot title rdma: hangs in blktests since 5.18-rc1+
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply -- ideally with also
telling regzbot about it, as explained here:
https://linux-regtracking.leemhuis.info/tracked-regression/

Reminder for developers: When fixing the issue, add 'Link:' tags
pointing to the report (the mail this one replied to), as the kernel's
documentation call for; above page explains why this is important for
tracked regressions.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

On 06.05.22 20:11, Bob Pearson wrote:
> Bart,
> 
> Before the most recent kernel update I had blktests running OK on rdma_rxe. Since we went on to 5.18.0-rc1+
> I have been experiencing hangs. All of this is with the 'revert scsi-debug' patch which addressed the
> 3 min timeout related to modprobe -r scsi-debug.
> 
> You suggested checking with siw and I finally got around to this and the behavior is exactly the same.
> 
> Specifically here is a run and dmesgs from that run:
> 
> root@u-22:/home/bob/src/blktests# use_siw=1 ./check srp
> 
> srp/001 (Create and remove LUNs)                             [passed]
> 
>     runtime  3.388s  ...  3.501s
> 
> srp/002 (File I/O on top of multipath concurrently with logout and login (mq))
> 
>     runtime  54.689s  ...
>   <HANGS HERE>
> 
> I had to reboot to recover.
> 
> The dmesg output is attached in a long file called out.
> The output looks normal until line 1875 where it hangs at an "Already connected ..." message.
> This is the same as the other hangs I have been seeing.
> This is followed by a splat warning that a cpu has hung for 120 seconds.
> 
> Since this is behaving the same for rxe and siw I am going to stop chasing this bug since
> it is most likely outside of the the rxe driver.
> 
> Bob
>