> On Aug 29, 2022, at 12:45 PM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Fri, Aug 26, 2022 at 07:57:04PM +0000, Chuck Lever III wrote: >> The connect APIs would be a place to start. In the meantime, though... >> >> Two or three years ago I spent some effort to ensure that closing >> an RDMA connection leaves a client-side RPC/RDMA transport with no >> RDMA resources associated with it. It releases the CQs, QP, and all >> the MRs. That makes initial connect and reconnect both behave exactly >> the same, and guarantees that a reconnect does not get stuck with >> an old CQ that is no longer working or a QP that is in TIMEWAIT. >> >> However that does mean that substantial resource allocation is >> done on every reconnect. > > And if the resource allocations fail then what happens? The storage > ULP retries forever and is effectively deadlocked? The reconnection attempt fails, and any resources allocated during that attempt are released. The ULP waits a bit then tries again until it works or is interrupted. A deadlock might occur if one of those allocations triggers additional reclaim activity. > How much allocation can you safely do under GFP_NOIO? My naive take is that doing those allocations under NOIO would help avoid recursion during memory exhaustion. -- Chuck Lever