Re: [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On May 7, 2023, at 5:20 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Fri, 08 Apr 2022, Chuck Lever wrote:
>> Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
>> investigation shows that the crash occurred while NFSD was handling
>> a deferred request.
>> 
>> This patch addresses two inter-related issues that prevent request
>> deferral from working correctly for RPC/RDMA requests:
>> 
>> 1. Prevent the crash by ensuring that the original
>>   svc_rqst::rq_xprt_ctxt value is available when the request is
>>   revisited. Otherwise svc_rdma_sendto() does not have a Receive
>>   context available with which to construct its reply.
>> 
>> 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
>>   compute .len and .buflen for received RPC Calls"),
>>   svc_rdma_recvfrom() did not include the transport header in the
>>   returned xdr_buf. There should have been no need for svc_defer()
>>   and friends to save and restore that header, as of that commit.
>>   This issue is addressed in a backport-friendly way by simply
>>   having svc_rdma_recvfrom() set rq_xprt_hlen to zero
>>   unconditionally, just as svc_tcp_recvfrom() does. This enables
>>   svc_deferred_recv() to correctly reconstruct an RPC message
>>   received via RPC/RDMA.
> 
> I'm a bit late to this party but .... this patch is bad and I don't know
> how best to fix it.
> It is bad for two reasons.
> 1/ It can leak memory.  When a deferral happens, the context is moved
>   into an svc_deferred_req.  There are a couple of places which assume
>   that an svc_deferred_req can be freed with kfree(), such as
>   svc_delete_xprt() and svc_revisit().  These will now leak the
>   context.  This is the bit that is hard to fix.
> 
> 2/ It can cause a double-free with UDP (and maybe rdma).
>   When a request is deferred, the ctxt is moved to the dreq.
>   When that request is revisited the ctxt is copied back into the rqst.
>   If the rqst is again deferred then the old dreq is reused and,
>   importantly, the rq_xprt_ctxt is not cleared.  So when the release
>   function is called the ctxt is freed.
>   When the request is revisited a second time that ctxt (now pointing
>   to freed and possibly reused memory) is copied back into the rqst.
>   When the request completes the ctxt is again freed - double free
>   oops.
> 
> For UDP there is no value in saving the ctxt in the dreq - the content
> of the ctxt, which is an skbuf, has been copied into the dreq.  So maybe
> the UDB ctxt is a very different beast than the rdma ctxt and needs to
> be handled completely differently.
> 
> We can fix the UDP double-free by always doing
> rqstp->rq_xprt_ctxt = NULL;
> whether a new dreq was needed or not.  But that doesn't fix the leaking
> of ctxts and is really just a band-aid.

A double-free is potentially catastrophic, so I would
say this is a reasonable change that can be back-ported
without fuss (while noting that more is needed).

The RDMA recv_ctxt is persistent for the life of the
connection. Releasing one of those just puts it back on
a free list. So, maybe the second free could cause free
list corruption?


> Thoughts?
> Do we need to preserve ALL of the svc_rdma_recv_ctxt for deferred
> requests?  Could we just copy some bits into the dreq (allocate a bit
> more space somehow) so that a simple kfree() is still enough?
> Or do we need a free_ctxt() handler attached to the xprt?

While I take some time to review code and options, did
you know there is a deferral fault injector that might
possibly help you sort through some of this? I doubt I
took the time to try forcing a second deferral of the
same request.


> Thanks,
> NeilBrown
> 
> 
> 
> 
>> 
>> Reported-by: Trond Myklebust <trondmy@xxxxxxxxxxxxxxx>
>> Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@xxxxxxxxxxxxxxx/
>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> Cc: <stable@xxxxxxxxxxxxxxx>
>> ---
>> include/linux/sunrpc/svc.h              |    1 +
>> net/sunrpc/svc_xprt.c                   |    3 +++
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
>> 3 files changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index a5dda4987e8b..217711fc9cac 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -395,6 +395,7 @@ struct svc_deferred_req {
>> size_t addrlen;
>> struct sockaddr_storage daddr; /* where reply must come from */
>> size_t daddrlen;
>> + void *xprt_ctxt;
>> struct cache_deferred_req handle;
>> size_t xprt_hlen;
>> int argslen;
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 0c117d3bfda8..b42cfffa7395 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -1231,6 +1231,8 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
>> dr->daddr = rqstp->rq_daddr;
>> dr->argslen = rqstp->rq_arg.len >> 2;
>> dr->xprt_hlen = rqstp->rq_xprt_hlen;
>> + dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
>> + rqstp->rq_xprt_ctxt = NULL;
>> 
>> /* back up head to the start of the buffer and copy */
>> skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
>> @@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp)
>> rqstp->rq_xprt_hlen   = dr->xprt_hlen;
>> rqstp->rq_daddr       = dr->daddr;
>> rqstp->rq_respages    = rqstp->rq_pages;
>> + rqstp->rq_xprt_ctxt   = dr->xprt_ctxt;
>> svc_xprt_received(rqstp->rq_xprt);
>> return (dr->argslen<<2) - dr->xprt_hlen;
>> }
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index cf76a6ad127b..864131a9fc6e 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> goto out_err;
>> if (ret == 0)
>> goto out_drop;
>> - rqstp->rq_xprt_hlen = ret;
>> + rqstp->rq_xprt_hlen = 0;
>> 
>> if (svc_rdma_is_reverse_direction_reply(xprt, ctxt))
>> goto out_backchannel;


--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux