Re: [PATCH v1] RDMA/core: Fix check_flush_dependency splat on addr_wq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 24, 2022 at 02:09:52PM +0000, Chuck Lever III wrote:
> 
> 
> > On Aug 24, 2022, at 5:20 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> > 
> > On Tue, Aug 23, 2022 at 01:58:44PM +0000, Chuck Lever III wrote:
> >> 
> >> 
> >>> On Aug 23, 2022, at 4:09 AM, Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >>> 
> >>> On Mon, Aug 22, 2022 at 11:30:20AM -0400, Chuck Lever wrote:
> > 
> > <...>
> > 
> >>>> The xprtiod work queue is WQ_MEM_RECLAIM, so any work queue that
> >>>> one of its work items tries to cancel has to be WQ_MEM_RECLAIM to
> >>>> prevent a priority inversion.
> >>> 
> >>> But why do you have WQ_MEM_RECLAIM in xprtiod?
> >> 
> >> Because RPC is under a filesystem (NFS). Therefore it has to handle
> >> writeback demanded by direct reclaim. All of the storage ULPs have
> >> this constraint, in fact.
> > 
> > I don't know, this ib_addr workqueue is used when connection is created.
> 
> Reconnection is exactly when we need to ensure that creating
> a new connection won't trigger more memory allocation, because
> that will immediately deadlock.
> 
> Again, all network storage ULPs have this constraint.

IMHO this whole concept is broken.

The RDMA stack does not operate globally under RECLAIM, nor should it.

If you attempt to do a reconnection/etc from within a RECLAIM context
it will deadlock on one of the many allocations that are made to
support opening the connection.

The general idea of reclaim is that the entire task context working
under the reclaim is marked with an override of the gfp flags to make
all allocations under that call chain reclaim safe.

But rdmacm does allocations outside this, eg in the WQs processing the
CM packets. So this doesn't work and we will deadlock.

Fixing it is a big deal and needs more that poking WQ_MEM_RECLAIM here
and there..

For instance, this patch is just incorrect, you can't use
WQ_MEM_RECLAIM on a WQ that is doing allocations and expect anything
useful to happen:

addr_resolve()
 addr_resolve_neigh()
  fetch_ha()
   ib_nl_fetch_ha()
     ib_nl_ip_send_msg()
       	skb = nlmsg_new(len, GFP_KERNEL);

So regardless of the MEM_RECLAIM the *actual work* being canceled can
be stuck on an non-reclaim-safe allocation.

Jason



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux