Re: NFS over RDMA crashing

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Wed, 12 Mar 2014 11:03:52 -0400

On Mar 12, 2014, at 10:28, Jeffrey Layton <jlayton@xxxxxxxxxx> wrote:

> On Wed, 12 Mar 2014 10:05:24 -0400
> Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> 
>> 
>> On Mar 12, 2014, at 9:33, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>> 
>>> On Sat, 08 Mar 2014 14:13:44 -0600
>>> Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>> 
>>>> On 3/8/2014 1:20 PM, Steve Wise wrote:
>>>>> 
>>>>>> I removed your change and started debugging original crash that 
>>>>>> happens on top-o-tree.   Seems like rq_next_pages is screwed
>>>>>> up.  It should always be >= rq_respages, yes?  I added a
>>>>>> BUG_ON() to assert this in rdma_read_xdr() we hit the BUG_ON().
>>>>>> Look
>>>>>> 
>>>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>>>>>> rq_next_page = 0xffff8800b84e6228
>>>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>>>>>> rq_respages = 0xffff8800b84e62a8
>>>>>> 
>>>>>> Any ideas Bruce/Tom?
>>>>>> 
>>>>> 
>>>>> Guys, the patch below seems to fix the problem.  Dunno if it is 
>>>>> correct though.  What do you think?
>>>>> 
>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
>>>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> index 0ce7552..6d62411 100644
>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst
>>>>> *rqstp, sge_no++;
>>>>>      }
>>>>>      rqstp->rq_respages = &rqstp->rq_pages[sge_no];
>>>>> +       rqstp->rq_next_page = rqstp->rq_respages;
>>>>> 
>>>>>      /* We should never run out of SGE because the limit is
>>>>> defined to
>>>>>       * support the max allowed RPC data length
>>>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct 
>>>>> svcxprt_rdma *xprt,
>>>>> 
>>>>>      /* rq_respages points one past arg pages */
>>>>>      rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
>>>>> +       rqstp->rq_next_page = rqstp->rq_respages;
>>>>> 
>>>>>      /* Create the reply and chunk maps */
>>>>>      offset = 0;
>>>>> 
>>>>> 
>>>> 
>>>> While this patch avoids the crashing, it apparently isn't
>>>> correct...I'm getting IO errors reading files over the mount. :)
>>>> 
>>> 
>>> I hit the same oops and tested your patch and it seems to have fixed
>>> that particular panic, but I still see a bunch of other mem
>>> corruption oopses even with it. I'll look more closely at that when
>>> I get some time.
>>> 
>>> FWIW, I can easily reproduce that by simply doing something like:
>>> 
>>>  $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
>>> 
>>> I'm not sure why you're not seeing any panics with your patch in
>>> place. Perhaps it's due to hw differences between our test rigs.
>>> 
>>> The EIO problem that you're seeing is likely the same client bug
>>> that Chuck recently fixed in this patch:
>>> 
>>>  [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
>>> 
>>> AIUI, Trond is merging that set for 3.15, so I'd make sure your
>>> client has those patches when testing.
>>> 
>> 
>> Nothing is in my queue yet.
>> 
> 
> Doh! Any reason not to merge that set from Chuck? They do fix a couple
> of nasty client bugs…
> 

Most of them are one-line debugging dprintks which I do not intend to apply.

One of them confuses a readdir optimisation with a bugfix; at the very least the patch comments need changing.
That leaves 2 that can go in, however as they are clearly insufficient to make RDMA safe for general use, they certainly do not warrant a stable@ label. The workaround for the Oopses is simple: use TCP.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html