On Thu, 07 Apr 2022, J. Bruce Fields wrote: > On Thu, Apr 07, 2022 at 11:19:34AM +1000, NeilBrown wrote: > > I had a look through the various places where alloc can now fail. > > > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > > cause of a problem here. I don't think an -ENOMEM from there is caught, > > so it could likely filter up to NFS and result in the message you got. > > > > I don't think we can easily handle failure there. We need to stay with > > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > > swap-over-NFS. > > > > Bruce: can you change that one line back to GFP_KERNEL and see if the > > problem goes away? > > Like this? Sure--might take me a day or two to run the tests and get > results back.--b. > > diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c > index 05b38bf68316..506627dc9a0f 100644 > --- a/net/sunrpc/socklib.c > +++ b/net/sunrpc/socklib.c > @@ -223,7 +223,7 @@ static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg, > { > int err; > > - err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask()); > + err = xdr_alloc_bvec(xdr, GFP_KERNEL); > if (err < 0) > return err; > > That looks right. I instrumented my kernel to deliberately fail 10% of the time, and I got lots of nfs: RPC call returned error 12 so I'm fairly sure this explains that message. But you say the hangs were only occasionally accompanied by the message, so it probably doesn't explain the hangs. NeilBrown