> -----Original Message----- > From: Bob Pearson <rpearsonhpe@xxxxxxxxx> > Sent: Wednesday, 23 August 2023 18:19 > To: Bart Van Assche <bvanassche@xxxxxxx>; Shinichiro Kawasaki > <shinichiro.kawasaki@xxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx > Subject: [EXTERNAL] Re: [bug report] blktests srp/002 hang > > On 8/22/23 10:20, Bart Van Assche wrote: > > On 8/22/23 03:18, Shinichiro Kawasaki wrote: > >> CC+: Bart, > >> > >> On Aug 21, 2023 / 20:46, Bob Pearson wrote: > >> [...] > >>> Shinichiro, > >> > >> Hello Bob, thanks for the response. > >> > >>> > >>> I have been aware for a long time that there is a problem with > blktests/srp. I see hangs in > >>> 002 and 011 fairly often. > >> > >> I repeated the test case srp/011, and observed it hangs. This hang at > srp/011 > >> also can be recreated in stable manner. I reverted the commit > 9b4b7c1f9f54 > >> then observed the srp/011 hang disappeared. So, I guess these two hangs > have > >> same root cause. > >> > >>> I have not been able to figure out the root cause but suspect that > >>> there is a timing issue in the srp drivers which cannot handle the > slowness of the software > >>> RoCE implemtation. If you can give me any clues about what you are > seeing I am happy to help > >>> try to figure this out. > >> > >> Thanks for sharing your thoughts. I myself do not have srp driver > knowledge, and > >> not sure what clue I should provide. If you have any idea of the action > I can > >> take, please let me know. > > > > Hi Shinichiro and Bob, > > > > When I initially developed the SRP tests these were working reliably in > > combination with the rdma_rxe driver. Since 2017 I frequently see issues > when > > running the SRP tests on top of the rdma_rxe driver, issues that I do not > see > > if I run the SRP tests on top of the soft-iWARP driver (siw). How about > > changing the default for the SRP tests from rdma_rxe to siw and to let > the > > RDMA community resolve the rdma_rxe issues? > > > > Thanks, > > > > Bart. > > > > Bart, > > I have also seen the same hangs in siw. Not as frequently but the same > symptoms. I did not hear about that one form siw side, but will try to make up some time to reproduce it and fix siw in case. I'll let you know if I find something, Bob. Bernard. > About every month or so I take another run at trying to find and fix this > bug but > I have not succeeded yet. I haven't seen anything that looks like bad > behavior from > the rxe side but that doesn't prove anything. I also saw these hangs on my > system > before the WQ patch went in if my memory serves. Out main application for > this > driver at HPE is Lustre which is a little different than SRP but uses the > same > general approach with fast MRs. Currently we are finding the driver to be > quite stable > even under very heavy stress. > > I would be happy to collaborate with someone (you?) who knows the SRP side > well to resolve > this hang. I think that is the quickest way to fix this. I have no idea > what SRP is waiting for. > > Best regards, > > Bob