On Fri, Aug 27, 2021 at 3:03 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote: > > On 8/25/21 11:32 AM, Jason Gunthorpe wrote: > > On Wed, Aug 25, 2021 at 11:02:14AM +0800, Zhu Yanjun wrote: > >> On Tue, Aug 24, 2021 at 11:02 AM Bart Van Assche <bvanassche@xxxxxxx> wrote: > >>> > >>> Hi Bob, > >>> > >>> If I run the following test against Linus' master branch then that test > >>> passes (commit d5ae8d7f85b7 ("Revert "media: dvb header files: move some > >>> headers to staging"")): > >>> > >>> # export use_siw=1 && modprobe brd && (cd blktests && ./check -q srp/002) > >>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [passed] > >>> runtime ... 48.849s > >>> > >>> The following test fails: > >>> > >>> # export use_siw= && modprobe brd && (cd blktests && ./check -q srp/002) > >>> srp/002 (File I/O on top of multipath concurrently with logout and login (mq)) [failed] > >>> runtime 48.849s ... 15.024s > >>> +++ /home/bart/software/blktests/results/nodev/srp/002.out.bad 2021-08-23 19:51:05.182958728 -0700 > >>> @@ -1,2 +1 @@ > >>> Configured SRP target driver > >>> -Passed > >> > >> Can this commit "RDMA/rxe: Zero out index member of struct rxe_queue" > >> in the link https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?h=wip/jgg-for-rc > >> fix this problem? > >> > >> And the commit will be merged into linux upstream very soon. > > > > Please let me know Bart, if the rxe driver is still broken I will > > definitely punt all the changes for RXE to the next cycle until it can > > be fixed. > > > > Jason > > > > Jason, Bart, Zhu > > I have succeeded in getting blktest to pass on 5.14. There is a bug in rxe that I had to fix. In > loopback mode when an RNR NAK is received it requests the requester to start a retry sequence > before the rnr timer fires which results in the command being retried immediately regardless of the > value of the timeout. I made a small change which requires the requester to wait for either the > timer to fire or an ack to arrive. The srp/002 test case in blktest spends a long time before posting Can this problem be reproduced with 5.13? From Bart, this problem will not occur with v5.13. Thanks Zhu Yanjun > a receive in some cases which caused a soft lockup. There is a second non-bug which is the number of > MRs was too small to run the test. I increased these by a factor of 256 which fixed that. > > My test setup has for-next + 5 recent rxe fix patches applied in addition to the RNR timing one above. > > I will submit a patch for the rnr fix. > > Bob >