On Thu, 07 Apr 2022, Trond Myklebust wrote: > On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote: > > On Thu, 07 Apr 2022, NeilBrown wrote: > > > On Thu, 07 Apr 2022, Dave Chinner wrote: > > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > > > > > In the last couple days I've started getting hangs on xfstests > > > > > generic/186 on upstream. I also notice the test completes > > > > > after 10+ > > > > > hours (usually it takes about 5 minutes). Sometimes this is > > > > > accompanied > > > > > by "nfs: RPC call returned error 12" on the client. > > > > > > > > #define ENOMEM 12 /* Out of memory */ > > > > > > > > So either the client or the server is running out of memory > > > > somewhere? > > > > > > Probably the client. There are a bunch of changes recently which > > > add > > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that > > > can > > > result in deadlocks when swapping over NFS. > > > This means that kmalloc request that previously never failed > > > (because > > > GFP_KERNEL never fails for kernel threads I think) can now fail. > > > This > > > has tickled one bug that I know of. There are likely to be more. > > > > > > The RPC code should simply retry these allocations after a short > > > delay. > > > HZ/4 is the number that is used in a couple of places. Possibly > > > there > > > are more places that need to handle -ENOMEM with rpc_delay(). > > > > I had a look through the various places where alloc can now fail. > > > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > > cause of a problem here. I don't think an -ENOMEM from there is > > caught, > > so it could likely filter up to NFS and result in the message you > > got. > > > > I don't think we can easily handle failure there. We need to stay > > with > > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > > swap-over-NFS. > > > > Bruce: can you change that one line back to GFP_KERNEL and see if the > > problem goes away? > > > > Can we please just move the call to xdr_alloc_bvec() out of > xprt_send_pagedata()? Move the client side allocation into > xs_stream_prepare_request() and xs_udp_send_request(), then move the > server side allocation into svc_udp_sendto(). > > That makes it possible to handle errors. Like the below I guess. Seems sensible, but I don't know the code well enough to really review it. > > > The other problem I found is that rpc_alloc_task() can now fail, but > > rpc_new_task assumes that it never will. If it does, then we get a > > NULL > > deref. > > > > I don't think rpc_new_task() can ever be called from the rpciod work > > queue, so it is safe to just use a mempool with GFP_KERNEL like we > > did > > before. > > > No. We shouldn't ever use mempools with GFP_KERNEL. Why not? mempools with GFP_KERNEL make perfect sense outside of the rpciod and nfsiod threads. > > Most, if not all of the callers of rpc_run_task() are still capable of > dealing with errors, and ditto for the callers of rpc_run_bc_task(). Yes, they can deal with errors. But in many cases that do so by passing the error up the call stack so we could start getting ENOMEM for systemcalls like stat(). I don't think that is a good idea. Thanks, NeilBrown > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > > > diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index 05b38bf68316..71ba4cf513bc 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -221,12 +221,6 @@ static int xprt_send_kvec(struct socket *sock, struct msghdr *msg, static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg, struct xdr_buf *xdr, size_t base) { - int err; - - err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask()); - if (err < 0) - return err; - iov_iter_bvec(&msg->msg_iter, WRITE, xdr->bvec, xdr_buf_pagecount(xdr), xdr->page_len + xdr->page_base); return xprt_sendmsg(sock, msg, base + xdr->page_base); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 78af7518f263..2661828f7a85 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -828,6 +828,9 @@ xs_stream_prepare_request(struct rpc_rqst *req) xdr_free_bvec(&req->rq_rcv_buf); req->rq_task->tk_status = xdr_alloc_bvec( &req->rq_rcv_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); + if (req->rq_task->tk_status == 0) + req->rq_task->tk_status = xdr_alloc_bvec( + &req->rq_snd_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); } /*