On 10/17/23 17:42, Bart Van Assche wrote: > On 10/17/23 14:39, Bob Pearson wrote: >> On 10/17/23 16:30, Bart Van Assche wrote: >>> >>> On 10/17/23 14:23, Bob Pearson wrote: >>>> Not really, but stuck could mean it died (no threads active) or it is >>>> in a loop or waiting to be scheduled. It looks dead. The lower layers are >>>> waiting to get kicked into action by some event but it hasn't happened. >>>> This is conjecture on my part though. >>> >>> This call stack means that I/O has been submitted by the block layer and >>> that it did not get completed. Which I/O request got stuck can be >>> verified by e.g. running the list-pending-block-requests script that I >>> posted some time ago. See also >>> https://lore.kernel.org/all/55c0fe61-a091-b351-11b4-fa7f668e49d7@xxxxxxx/. >> >> Thanks. Would this run on the side of a hung blktests or would I need to >> setup an srp-srpt file system? > > I propose to analyze the source code of the component(s) that you > suspect of causing the hang. The output of the list-pending-block- > requests script is not sufficient to reveal which of the following > drivers is causing the hang: ib_srp, rdma_rxe, ib_srpt, ... > > Thanks, > > Bart. > Bart, Another data point. I had seen (months ago) that both the rxe and siw drivers could cause blktests srp hangs. More recently when I configure my kernel to run lots of tests (lockdep, memory leaks, kasan, ubsan, etc.), which definitely slows performance and adds delays, the % of srp/002 runs which hang on the rxe driver has gone from 10%+- to a solid 100%. This suggested retrying the siw driver on the debug kernel since it has the reputation of always running successfully. I now find that siw also hangs solidly on srp/002. This is another hint that we are seeing a timing issue. Bob