Re: NFS over RDMA crashing

Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> · Wed, 12 Mar 2014 10:05:24 -0400

On Mar 12, 2014, at 9:33, Jeff Layton <jlayton@xxxxxxxxxx> wrote:

> On Sat, 08 Mar 2014 14:13:44 -0600
> Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
>> On 3/8/2014 1:20 PM, Steve Wise wrote:
>>> 
>>>> I removed your change and started debugging original crash that 
>>>> happens on top-o-tree.   Seems like rq_next_pages is screwed up.  It 
>>>> should always be >= rq_respages, yes?  I added a BUG_ON() to assert 
>>>> this in rdma_read_xdr() we hit the BUG_ON(). Look
>>>> 
>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>>>> rq_next_page = 0xffff8800b84e6228
>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>>>> rq_respages = 0xffff8800b84e62a8
>>>> 
>>>> Any ideas Bruce/Tom?
>>>> 
>>> 
>>> Guys, the patch below seems to fix the problem.  Dunno if it is 
>>> correct though.  What do you think?
>>> 
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c 
>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> index 0ce7552..6d62411 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
>>>               sge_no++;
>>>       }
>>>       rqstp->rq_respages = &rqstp->rq_pages[sge_no];
>>> +       rqstp->rq_next_page = rqstp->rq_respages;
>>> 
>>>       /* We should never run out of SGE because the limit is defined to
>>>        * support the max allowed RPC data length
>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct 
>>> svcxprt_rdma *xprt,
>>> 
>>>       /* rq_respages points one past arg pages */
>>>       rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
>>> +       rqstp->rq_next_page = rqstp->rq_respages;
>>> 
>>>       /* Create the reply and chunk maps */
>>>       offset = 0;
>>> 
>>> 
>> 
>> While this patch avoids the crashing, it apparently isn't correct...I'm 
>> getting IO errors reading files over the mount. :)
>> 
> 
> I hit the same oops and tested your patch and it seems to have fixed
> that particular panic, but I still see a bunch of other mem corruption
> oopses even with it. I'll look more closely at that when I get some
> time.
> 
> FWIW, I can easily reproduce that by simply doing something like:
> 
>   $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
> 
> I'm not sure why you're not seeing any panics with your patch in place.
> Perhaps it's due to hw differences between our test rigs.
> 
> The EIO problem that you're seeing is likely the same client bug that
> Chuck recently fixed in this patch:
> 
>   [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
> 
> AIUI, Trond is merging that set for 3.15, so I'd make sure your client
> has those patches when testing.
> 

Nothing is in my queue yet.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@xxxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html