Re: [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Aug 26, 2022, at 10:08 AM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> 
> On Fri, Aug 26, 2022 at 02:02:55PM +0000, Chuck Lever III wrote:
> 
>> I see recent commits that do exactly what I've done for the reason I've done it.
>> 
>> 4c4b1996b5db ("IB/hfi1: Fix WQ_MEM_RECLAIM warning")
> 
> No, this one says:
> 
>    The hfi1_wq does not allocate memory with GFP_KERNEL or otherwise become
>    entangled with memory reclaim, so this flag is appropriate.
> 
> So it is OK, it is not the same thing as adding WQ_MEM_RECLAIM to a WQ
> that allocates memory.
> 
>> I accept that this might be a long chain to pull, but we need a plan
>> to resolve this. 
> 
> It is not just a long chain, it is something that was never designed
> to even work or thought about. People put storage ULPs on top of this
> and just ignored the problem.
> 
> If someone wants to tackle this then we need a comprehensive patch
> series identifying what functions are safe to call under memory
> reclaim contexts and then fully auditing them that they are actually
> safe.
> 
> Right now I don't even know the basic information what functions the
> storage community need to be reclaim safe.

The connect APIs would be a place to start. In the meantime, though...

Two or three years ago I spent some effort to ensure that closing
an RDMA connection leaves a client-side RPC/RDMA transport with no
RDMA resources associated with it. It releases the CQs, QP, and all
the MRs. That makes initial connect and reconnect both behave exactly
the same, and guarantees that a reconnect does not get stuck with
an old CQ that is no longer working or a QP that is in TIMEWAIT.

However that does mean that substantial resource allocation is
done on every reconnect.

One way to resolve the check_flush_dependency() splat would be
to have rpcrdma.ko allocate its own workqueue for handling
connections and MR allocation, and leave WQ_MEM_RECLAIM disabled
for it. Basically, replace the use of the xprtiod workqueue for
RPC/RDMA transports.


--
Chuck Lever







[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux