On Fri, 08 Apr 2022, Chuck Lever wrote: > Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further > investigation shows that the crash occurred while NFSD was handling > a deferred request. > > This patch addresses two inter-related issues that prevent request > deferral from working correctly for RPC/RDMA requests: > > 1. Prevent the crash by ensuring that the original > svc_rqst::rq_xprt_ctxt value is available when the request is > revisited. Otherwise svc_rdma_sendto() does not have a Receive > context available with which to construct its reply. > > 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly > compute .len and .buflen for received RPC Calls"), > svc_rdma_recvfrom() did not include the transport header in the > returned xdr_buf. There should have been no need for svc_defer() > and friends to save and restore that header, as of that commit. > This issue is addressed in a backport-friendly way by simply > having svc_rdma_recvfrom() set rq_xprt_hlen to zero > unconditionally, just as svc_tcp_recvfrom() does. This enables > svc_deferred_recv() to correctly reconstruct an RPC message > received via RPC/RDMA. I'm a bit late to this party but .... this patch is bad and I don't know how best to fix it. It is bad for two reasons. 1/ It can leak memory. When a deferral happens, the context is moved into an svc_deferred_req. There are a couple of places which assume that an svc_deferred_req can be freed with kfree(), such as svc_delete_xprt() and svc_revisit(). These will now leak the context. This is the bit that is hard to fix. 2/ It can cause a double-free with UDP (and maybe rdma). When a request is deferred, the ctxt is moved to the dreq. When that request is revisited the ctxt is copied back into the rqst. If the rqst is again deferred then the old dreq is reused and, importantly, the rq_xprt_ctxt is not cleared. So when the release function is called the ctxt is freed. When the request is revisited a second time that ctxt (now pointing to freed and possibly reused memory) is copied back into the rqst. When the request completes the ctxt is again freed - double free oops. For UDP there is no value in saving the ctxt in the dreq - the content of the ctxt, which is an skbuf, has been copied into the dreq. So maybe the UDB ctxt is a very different beast than the rdma ctxt and needs to be handled completely differently. We can fix the UDP double-free by always doing rqstp->rq_xprt_ctxt = NULL; whether a new dreq was needed or not. But that doesn't fix the leaking of ctxts and is really just a band-aid. Thoughts? Do we need to preserve ALL of the svc_rdma_recv_ctxt for deferred requests? Could we just copy some bits into the dreq (allocate a bit more space somehow) so that a simple kfree() is still enough? Or do we need a free_ctxt() handler attached to the xprt? Thanks, NeilBrown > > Reported-by: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> > Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@xxxxxxxxxxxxxxx/ > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > --- > include/linux/sunrpc/svc.h | 1 + > net/sunrpc/svc_xprt.c | 3 +++ > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +- > 3 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > index a5dda4987e8b..217711fc9cac 100644 > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -395,6 +395,7 @@ struct svc_deferred_req { > size_t addrlen; > struct sockaddr_storage daddr; /* where reply must come from */ > size_t daddrlen; > + void *xprt_ctxt; > struct cache_deferred_req handle; > size_t xprt_hlen; > int argslen; > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > index 0c117d3bfda8..b42cfffa7395 100644 > --- a/net/sunrpc/svc_xprt.c > +++ b/net/sunrpc/svc_xprt.c > @@ -1231,6 +1231,8 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req) > dr->daddr = rqstp->rq_daddr; > dr->argslen = rqstp->rq_arg.len >> 2; > dr->xprt_hlen = rqstp->rq_xprt_hlen; > + dr->xprt_ctxt = rqstp->rq_xprt_ctxt; > + rqstp->rq_xprt_ctxt = NULL; > > /* back up head to the start of the buffer and copy */ > skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len; > @@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp) > rqstp->rq_xprt_hlen = dr->xprt_hlen; > rqstp->rq_daddr = dr->daddr; > rqstp->rq_respages = rqstp->rq_pages; > + rqstp->rq_xprt_ctxt = dr->xprt_ctxt; > svc_xprt_received(rqstp->rq_xprt); > return (dr->argslen<<2) - dr->xprt_hlen; > } > diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > index cf76a6ad127b..864131a9fc6e 100644 > --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c > @@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp) > goto out_err; > if (ret == 0) > goto out_drop; > - rqstp->rq_xprt_hlen = ret; > + rqstp->rq_xprt_hlen = 0; > > if (svc_rdma_is_reverse_direction_reply(xprt, ctxt)) > goto out_backchannel; > > >