Re: sporadic hangs on generic/186

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 7 Apr 2022 01:54:20 +0000

On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote:
> On Thu, 07 Apr 2022, NeilBrown wrote:
> > On Thu, 07 Apr 2022, Dave Chinner wrote:
> > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote:
> > > > In the last couple days I've started getting hangs on xfstests
> > > > generic/186 on upstream.  I also notice the test completes
> > > > after 10+
> > > > hours (usually it takes about 5 minutes).  Sometimes this is
> > > > accompanied
> > > > by "nfs: RPC call returned error 12" on the client.
> > > 
> > > #define ENOMEM          12      /* Out of memory */
> > > 
> > > So either the client or the server is running out of memory
> > > somewhere?
> > 
> > Probably the client.  There are a bunch of changes recently which
> > add
> > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that
> > can
> > result in deadlocks when swapping over NFS.
> > This means that kmalloc request that previously never failed
> > (because
> > GFP_KERNEL never fails for kernel threads I think) can now fail. 
> > This
> > has tickled one bug that I know of.  There are likely to be more.
> > 
> > The RPC code should simply retry these allocations after a short
> > delay. 
> > HZ/4 is the number that is used in a couple of places.  Possibly
> > there
> > are more places that need to handle -ENOMEM with rpc_delay().
> 
> I had a look through the various places where alloc can now fail.
> 
> I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely
> cause of a problem here.  I don't think an -ENOMEM from there is
> caught,
> so it could likely filter up to NFS and result in the message you
> got.
> 
> I don't think we can easily handle failure there.  We need to stay
> with
> GFP_KERNEL rely on PF_MEMALLOC to make forward progress for
> swap-over-NFS.
> 
> Bruce: can you change that one line back to GFP_KERNEL and see if the
> problem goes away?
> 

Can we please just move the call to xdr_alloc_bvec() out of
xprt_send_pagedata()? Move the client side allocation into
xs_stream_prepare_request() and xs_udp_send_request(), then move the
server side allocation into svc_udp_sendto().

That makes it possible to handle errors.

> The other problem I found is that rpc_alloc_task() can now fail, but
> rpc_new_task assumes that it never will.  If it does, then we get a
> NULL
> deref.
> 
> I don't think rpc_new_task() can ever be called from the rpciod work
> queue, so it is safe to just use a mempool with GFP_KERNEL like we
> did
> before. 
> 
No. We shouldn't ever use mempools with GFP_KERNEL.

Most, if not all of the callers of rpc_run_task() are still capable of
dealing with errors, and ditto for the callers of rpc_run_bc_task().

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx