On Aug 24, 2023 / 20:36, Bob Pearson wrote: > On 8/24/23 20:11, Shinichiro Kawasaki wrote: > > On Aug 22, 2023 / 08:20, Bart Van Assche wrote: > >> On 8/22/23 03:18, Shinichiro Kawasaki wrote: > >>> CC+: Bart, > >>> > >>> On Aug 21, 2023 / 20:46, Bob Pearson wrote: > >>> [...] > >>>> Shinichiro, > >>> > >>> Hello Bob, thanks for the response. > >>> > >>>> > >>>> I have been aware for a long time that there is a problem with blktests/srp. I see hangs in > >>>> 002 and 011 fairly often. > >>> > >>> I repeated the test case srp/011, and observed it hangs. This hang at srp/011 > >>> also can be recreated in stable manner. I reverted the commit 9b4b7c1f9f54 > >>> then observed the srp/011 hang disappeared. So, I guess these two hangs have > >>> same root cause. > >>> > >>>> I have not been able to figure out the root cause but suspect that > >>>> there is a timing issue in the srp drivers which cannot handle the slowness of the software > >>>> RoCE implemtation. If you can give me any clues about what you are seeing I am happy to help > >>>> try to figure this out. > >>> > >>> Thanks for sharing your thoughts. I myself do not have srp driver knowledge, and > >>> not sure what clue I should provide. If you have any idea of the action I can > >>> take, please let me know. > >> > >> Hi Shinichiro and Bob, > >> > >> When I initially developed the SRP tests these were working reliably in > >> combination with the rdma_rxe driver. Since 2017 I frequently see issues when > >> running the SRP tests on top of the rdma_rxe driver, issues that I do not see > >> if I run the SRP tests on top of the soft-iWARP driver (siw). How about > >> changing the default for the SRP tests from rdma_rxe to siw and to let the > >> RDMA community resolve the rdma_rxe issues? > > > > If it takes time to resolve the issues, it sounds a good idea to make siw driver > > default, since it will make the hangs less painful for blktests users. Another > > idea to reduce the pain is to improve srp/002 and srp/011 to detect the hangs > > and report them as failures. > > > > Having said that, some discussion started on this thread for resolution > > (thanks!) I would wait for a while and see how long it will take for solution, > > and if the actions on blktests side are valuable or not. > > Did you see Bart's comment about srp not working with older versions of multipathd? > He is currently not seeing any hangs at all. Yes, I saw it. My test system is Fedora 38 with device-mapper-multipathd package version 0.9.4. I compiled and installed the latest multipath-tools but still see the hangs. Not sure why it is observed on my test system and not observed on Bart's system.