On Wed, Oct 18, 2023 at 01:29:16PM -0500, Bob Pearson wrote: > On 10/17/23 17:42, Bart Van Assche wrote: > > On 10/17/23 14:39, Bob Pearson wrote: > >> On 10/17/23 16:30, Bart Van Assche wrote: > >>> > >>> On 10/17/23 14:23, Bob Pearson wrote: > >>>> Not really, but stuck could mean it died (no threads active) or it is > >>>> in a loop or waiting to be scheduled. It looks dead. The lower layers are > >>>> waiting to get kicked into action by some event but it hasn't happened. > >>>> This is conjecture on my part though. > >>> > >>> This call stack means that I/O has been submitted by the block layer and > >>> that it did not get completed. Which I/O request got stuck can be > >>> verified by e.g. running the list-pending-block-requests script that I > >>> posted some time ago. See also > >>> https://lore.kernel.org/all/55c0fe61-a091-b351-11b4-fa7f668e49d7@xxxxxxx/. > >> > >> Thanks. Would this run on the side of a hung blktests or would I need to > >> setup an srp-srpt file system? > > > > I propose to analyze the source code of the component(s) that you > > suspect of causing the hang. The output of the list-pending-block- > > requests script is not sufficient to reveal which of the following > > drivers is causing the hang: ib_srp, rdma_rxe, ib_srpt, ... > > > > Thanks, > > > > Bart. > > > > Bart, > > Another data point. I had seen (months ago) that both the rxe and > siw drivers could cause blktests srp hangs. More recently when I > configure my kernel to run lots of tests (lockdep, memory leaks, > kasan, ubsan, etc.), which definitely slows performance and adds > delays, the % of srp/002 runs which hang on the rxe driver has gone > from 10%+- to a solid 100%. This suggested retrying the siw driver > on the debug kernel since it has the reputation of always running > successfully. I now find that siw also hangs solidly on srp/002. > This is another hint that we are seeing a timing issue. If siw hangs as well, I definitely comfortable continuing to debug and leaving the work queues in-tree for now. Jason