On Wed, Aug 24, 2022 at 02:09:52PM +0000, Chuck Lever III wrote: > > > > On Aug 24, 2022, at 5:20 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: > > > > On Tue, Aug 23, 2022 at 01:58:44PM +0000, Chuck Lever III wrote: > >> > >> > >>> On Aug 23, 2022, at 4:09 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote: > >>> > >>> On Mon, Aug 22, 2022 at 11:30:20AM -0400, Chuck Lever wrote: > > > > <...> > > > >>>> The xprtiod work queue is WQ_MEM_RECLAIM, so any work queue that > >>>> one of its work items tries to cancel has to be WQ_MEM_RECLAIM to > >>>> prevent a priority inversion. > >>> > >>> But why do you have WQ_MEM_RECLAIM in xprtiod? > >> > >> Because RPC is under a filesystem (NFS). Therefore it has to handle > >> writeback demanded by direct reclaim. All of the storage ULPs have > >> this constraint, in fact. > > > > I don't know, this ib_addr workqueue is used when connection is created. > > Reconnection is exactly when we need to ensure that creating > a new connection won't trigger more memory allocation, because > that will immediately deadlock. > > Again, all network storage ULPs have this constraint. IMHO this whole concept is broken. The RDMA stack does not operate globally under RECLAIM, nor should it. If you attempt to do a reconnection/etc from within a RECLAIM context it will deadlock on one of the many allocations that are made to support opening the connection. The general idea of reclaim is that the entire task context working under the reclaim is marked with an override of the gfp flags to make all allocations under that call chain reclaim safe. But rdmacm does allocations outside this, eg in the WQs processing the CM packets. So this doesn't work and we will deadlock. Fixing it is a big deal and needs more that poking WQ_MEM_RECLAIM here and there.. For instance, this patch is just incorrect, you can't use WQ_MEM_RECLAIM on a WQ that is doing allocations and expect anything useful to happen: addr_resolve() addr_resolve_neigh() fetch_ha() ib_nl_fetch_ha() ib_nl_ip_send_msg() skb = nlmsg_new(len, GFP_KERNEL); So regardless of the MEM_RECLAIM the *actual work* being canceled can be stuck on an non-reclaim-safe allocation. Jason