> On Oct 24, 2016, at 3:17 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Mon, 2016-10-24 at 14:08 -0400, J. Bruce Fields wrote: >> On Mon, Oct 24, 2016 at 11:24:40AM -0400, Jeff Layton wrote: >>> >>> On Mon, 2016-10-24 at 11:19 -0400, Jeff Layton wrote: >>>> >>>> On Mon, 2016-10-24 at 09:51 -0400, Chuck Lever wrote: >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Oct 24, 2016, at 9:31 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>>>>> >>>>>> On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'm getting an intermittent crash in the nfs server as of >>>>>>>> 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer >>>>>>>> pointers for RPC Call and Reply messages". >>>>>>>> >>>>>>>> I haven't tried to understand that commit or why it would be a problem yet, I >>>>>>>> don't see an obvious connection--I can take a closer look Monday. >>>>>>>> >>>>>>>> Could even be that I just landed on this commit by chance, the problem is a >>>>>>>> little hard to reproduce so I don't completely trust my testing. >>>>>>> >>>>>>> I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me >>>>>>> reliably by running xfstests generic/013 case, on a loopback mounted >>>>>>> NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details >>>>>>> please see >>>>>>> >>>>>>> http://marc.info/?l=linux-nfs&m=147714320129362&w=2 >>>>>>> >>>>>> >>>>>> Looks like you landed at the same commit as Bruce, so that's probably >>>>>> legit. That commit is very small though. The only real change that >>>>>> doesn't affect the new field is this: >>>>>> >>>>>> >>>>>> @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task) >>>>>> req->rq_buffer, >>>>>> req->rq_callsize); >>>>>> xdr_buf_init(&req->rq_rcv_buf, >>>>>> - (char *)req->rq_buffer + req->rq_callsize, >>>>>> + req->rq_rbuffer, >>>>>> req->rq_rcvsize); >>>>>> >>>>>> >>>>>> So I'm guessing this is breaking the callback channel somehow? >>>>> >>>>> Could be the TCP backchannel code is using rq_buffer in a different >>>>> way than RDMA backchannel or the forward channel code. >>>>> >>>> >>>> Well, it basically allocates a page per rpc_rqst and then maps that. >>>> >>>> One thing I notice is that this patch ensures that rq_rbuffer gets set >>>> up in rpc_malloc and xprt_rdma_allocate, but it looks like >>>> xprt_alloc_bc_req didn't get the same treatment. >>>> >>>> I suspect that that may be the problem... >>>> >>> In fact, maybe we just need this here? (untested and probably >>> whitespace damaged): >> >> No change in results for me. >> >> --b. >>> >>> >>> diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c >>> index ac701c28f44f..c561aa8ce05b 100644 >>> --- a/net/sunrpc/backchannel_rqst.c >>> +++ b/net/sunrpc/backchannel_rqst.c >>> @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags) >>> goto out_free; >>> } >>> req->rq_rcv_buf.len = PAGE_SIZE; >>> + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base; >>> >>> /* Preallocate one XDR send buffer */ >>> if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) { > > Ahh ok, I think I see. > > We probably also need to set rq_rbuffer in bc_malloc and and > xprt_rdma_bc_allocate. > > My guess is that we're ending up in rpc_xdr_encode with a NULL > rq_rbuffer pointer, so the right fix would seem to be to ensure that it > is properly set whenever rq_buffer is set. > > So I think this may be what we want, actually. I'll plan to test it out > but may not get to it before tomorrow. > > -- > Jeff Layton <jlayton@xxxxxxxxxx><0001-sunrpc-fix-some-missing-rq_rbuffer-assignments.patch> This may not be working as well as I thought (at least for NFS/RDMA). xprt_rdma_bc_send_request releases the page allocated by xprt_rdma_bc_allocate before the reply arrives. call_decode then tries to dereference rq_rbuffer, but that's now a pointer to freed memory. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html