Re: v5.14 RXE driver broken?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 27, 2021 at 3:03 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
>
> On 8/25/21 11:32 AM, Jason Gunthorpe wrote:
> > On Wed, Aug 25, 2021 at 11:02:14AM +0800, Zhu Yanjun wrote:
> >> On Tue, Aug 24, 2021 at 11:02 AM Bart Van Assche <bvanassche@xxxxxxx> wrote:
> >>>
> >>> Hi Bob,
> >>>
> >>> If I run the following test against Linus' master branch then that test
> >>> passes (commit d5ae8d7f85b7 ("Revert "media: dvb header files: move some
> >>> headers to staging"")):
> >>>
> >>> # export use_siw=1 && modprobe brd && (cd blktests && ./check -q srp/002)
> >>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [passed]
> >>>     runtime    ...  48.849s
> >>>
> >>> The following test fails:
> >>>
> >>> # export use_siw= && modprobe brd && (cd blktests && ./check -q srp/002)
> >>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [failed]
> >>>     runtime  48.849s  ...  15.024s
> >>>     +++ /home/bart/software/blktests/results/nodev/srp/002.out.bad      2021-08-23 19:51:05.182958728 -0700
> >>>     @@ -1,2 +1 @@
> >>>      Configured SRP target driver
> >>>     -Passed
> >>
> >> Can this commit "RDMA/rxe: Zero out index member of struct rxe_queue"
> >> in the link https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/jgg-for-rc
> >> fix this problem?
> >>
> >> And the commit will be merged into linux upstream very soon.
> >
> > Please let me know Bart, if the rxe driver is still broken I will
> > definitely punt all the changes for RXE to the next cycle until it can
> > be fixed.
> >
> > Jason
> >
>
> Jason, Bart, Zhu
>
> I have succeeded in getting blktest to pass on 5.14. There is a bug in rxe that I had to fix. In
> loopback mode when an RNR NAK is received it requests the requester to start a retry sequence
> before the rnr timer fires which results in the command being retried immediately regardless of the
> value of the timeout. I made a small change which requires the requester to wait for either the
> timer to fire or an ack to arrive. The srp/002 test case in blktest spends a long time before posting

Can this problem be reproduced with 5.13? From Bart, this problem will
not occur with v5.13.

Thanks
Zhu Yanjun

> a receive in some cases which caused a soft lockup. There is a second non-bug which is the number of
> MRs was too small to run the test. I increased these by a factor of 256 which fixed that.
>
> My test setup has for-next + 5 recent rxe fix patches applied in addition to the RNR timing one above.
>
> I will submit a patch for the rnr fix.
>
> Bob
>



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux