On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote: > On Thu, 07 Apr 2022, NeilBrown wrote: > > On Thu, 07 Apr 2022, Dave Chinner wrote: > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > > > > In the last couple days I've started getting hangs on xfstests > > > > generic/186 on upstream. I also notice the test completes > > > > after 10+ > > > > hours (usually it takes about 5 minutes). Sometimes this is > > > > accompanied > > > > by "nfs: RPC call returned error 12" on the client. > > > > > > #define ENOMEM 12 /* Out of memory */ > > > > > > So either the client or the server is running out of memory > > > somewhere? > > > > Probably the client. There are a bunch of changes recently which > > add > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that > > can > > result in deadlocks when swapping over NFS. > > This means that kmalloc request that previously never failed > > (because > > GFP_KERNEL never fails for kernel threads I think) can now fail. > > This > > has tickled one bug that I know of. There are likely to be more. > > > > The RPC code should simply retry these allocations after a short > > delay. > > HZ/4 is the number that is used in a couple of places. Possibly > > there > > are more places that need to handle -ENOMEM with rpc_delay(). > > I had a look through the various places where alloc can now fail. > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > cause of a problem here. I don't think an -ENOMEM from there is > caught, > so it could likely filter up to NFS and result in the message you > got. > > I don't think we can easily handle failure there. We need to stay > with > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > swap-over-NFS. > > Bruce: can you change that one line back to GFP_KERNEL and see if the > problem goes away? > Can we please just move the call to xdr_alloc_bvec() out of xprt_send_pagedata()? Move the client side allocation into xs_stream_prepare_request() and xs_udp_send_request(), then move the server side allocation into svc_udp_sendto(). That makes it possible to handle errors. > The other problem I found is that rpc_alloc_task() can now fail, but > rpc_new_task assumes that it never will. If it does, then we get a > NULL > deref. > > I don't think rpc_new_task() can ever be called from the rpciod work > queue, so it is safe to just use a mempool with GFP_KERNEL like we > did > before. > No. We shouldn't ever use mempools with GFP_KERNEL. Most, if not all of the callers of rpc_run_task() are still capable of dealing with errors, and ditto for the callers of rpc_run_bc_task(). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx