On Apr 10, 2014, at 11:01 AM, Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> wrote: > On 4/9/2014 7:26 PM, Chuck Lever wrote: >> On Apr 9, 2014, at 7:56 PM, Devesh Sharma <Devesh.Sharma@xxxxxxxxxx> wrote: >> >>> Hi Chuk and Trond >>> >>> I will resend a v2 for this. >>> What if ib_post_send() fails with immidate error, I that case also DECR_CQCOUNT() will be called but no completion will be reported. Will that not cause any problems? >> We should investigate whether an error return from ib_post_{send,recv} means there will be no completion. But I’ve never seen these verbs fail in practice, so I’m not in a hurry to make work for anyone! ;-) > > A synchronous failure from ib_post_* means the WR (or at least one of them if there were > 1) failed and did not get submitted to HW. So there will be no completion for those that failed. OK. Our post operations are largely single WRs. Before we address CQCOUNT in error cases, we’d have to deal with chained WRs. Chained WRs are used only when rpcrdma_register_frmr_external() finds an MR that hasn’t been invalidated. That’s actually working around a FRMR re-use bug (commit 5c635e09). If the underlying re-use problem was fixed, we could get rid of the chained WR in register_frmr_external() (and we wouldn’t need completions at all for FAST_REG_MR). But at 100,000 feet, if a post operation fails, that seems like a very serious issue. I wonder whether we would be better off disconnecting and starting over in those cases. > >> However it seems to me the new (!ia->ri_id->qp) checks outside the connect logic are unnecessary. >> >> Clearly, as you noticed, the ib_post_{send,recv} verbs do not check that their “qp" argument is NULL before dereferencing it. >> >> But I don’t understand how xprtrdma can post any operation if the transport isn’t connected. In other words, how would it be possible to call rpcrdma_ep_post_recv() if the connect had failed and there was no QP? >> >> If disconnect wipes ia->ri_id->qp while there are still operations in progress, that would be the real bug. >> >> >>> Also in rpcrdma_register_frmr_external() I am seeing DECT_CQCOUNT is called twice >>> First at line 1538 (unlikely however) and second at line 1562. Shouldn't it be only at 1562? >> if (seg1->mr_chunk.rl_mw->r.frmr.state == FRMR_IS_VALID) then rpcrdma_register_frmr_external() posts two Work Requests (LOCAL_INV then FAST_REG_MR) with one ib_post_send(). Thus it is correct to DECR_CQCOUNT twice in that case because each WR will trigger a separate completion event. >> >> >>> -----Original Message----- >>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] >>> Sent: Thursday, April 10, 2014 1:57 AM >>> To: Devesh Sharma >>> Cc: Linux NFS Mailing List; linux-rdma@xxxxxxxxxxxxxxx; Trond Myklebust >>> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks >>> >>> >>> On Apr 9, 2014, at 4:22 PM, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote: >>> >>>> Hi Devesh, >>>> >>>> This looks a lot better. I still have a couple of small suggestions, though. >>>> >>>> On Apr 9, 2014, at 14:40, Devesh Sharma <devesh.sharma@xxxxxxxxxx> wrote: >>>> >>>>> If the rdma_create_qp fails to create qp due to device firmware being >>>>> in invalid state xprtrdma still tries to destroy the non-existant qp >>>>> and ends up in a NULL pointer reference crash. >>>>> Adding proper checks for vaidating QP pointer avoids this to happen. >>>>> >>>>> Signed-off-by: Devesh Sharma <devesh.sharma@xxxxxxxxxx> >>>>> --- >>>>> net/sunrpc/xprtrdma/verbs.c | 29 +++++++++++++++++++++++++---- >>>>> 1 files changed, 25 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c >>>>> b/net/sunrpc/xprtrdma/verbs.c index 9372656..902ac78 100644 >>>>> --- a/net/sunrpc/xprtrdma/verbs.c >>>>> +++ b/net/sunrpc/xprtrdma/verbs.c >>>>> @@ -831,10 +831,12 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) >>>>> if (ep->rep_connected != 0) { >>>>> struct rpcrdma_xprt *xprt; >>>>> retry: >>>>> - rc = rpcrdma_ep_disconnect(ep, ia); >>>>> - if (rc && rc != -ENOTCONN) >>>>> - dprintk("RPC: %s: rpcrdma_ep_disconnect" >>>>> + if (ia->ri_id->qp) { >>>>> + rc = rpcrdma_ep_disconnect(ep, ia); >>>>> + if (rc && rc != -ENOTCONN) >>>>> + dprintk("RPC: %s: rpcrdma_ep_disconnect" >>>>> " status %i\n", __func__, rc); >>>>> + } >>>>> rpcrdma_clean_cq(ep->rep_cq); >>>>> >>>>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia); @@ -859,7 >>>>> +861,9 @@ retry: >>>>> goto out; >>>>> } >>>>> /* END TEMP */ >>>>> - rdma_destroy_qp(ia->ri_id); >>>>> + if (ia->ri_id->qp) { >>>>> + rdma_destroy_qp(ia->ri_id); >>>>> + } >>>> Nit: No need for braces here. >>>> >>>>> rdma_destroy_id(ia->ri_id); >>>>> ia->ri_id = id; >>>>> } >>>>> @@ -1557,6 +1561,13 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg, >>>>> frmr_wr.wr.fast_reg.rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey; >>>>> DECR_CQCOUNT(&r_xprt->rx_ep); >>> I don't think you can DECR_CQCOUNT, then exit without posting the send. That will screw up the completion counter and result in a transport hang, won't it? >>> >>>>> + if (!ia->ri_is->qp) { >>>>> + rc = -EINVAL; >>>>> + while (i--) >>>>> + rpcrdma_unmap_one(ia, --seg); >>>>> + goto out; >>>>> + } >>>> Instead of duplicating the rpcrdma_unmap_one() cleanup here, why not >>>> just do >>>> >>>> if (ia->ri_is->qp) >>>> rc = ib_post_send(...) >>>> else >>>> rc = -EINVAL; >>>> >>>> BTW: can we not simply test for ia->ri_is->qp before we even call rpcrdma_map_one() and hence bail out before we have to do any cleanup? >>>> >>>>> + >>>>> rc = ib_post_send(ia->ri_id->qp, post_wr, &bad_wr); >>>>> >>>>> if (rc) { >>>>> @@ -1571,6 +1582,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg, >>>>> seg1->mr_len = len; >>>>> } >>>>> *nsegs = i; >>>>> +out: >>>>> return rc; >>>>> } >>>>> >>>>> @@ -1592,6 +1604,9 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg, >>>>> invalidate_wr.ex.invalidate_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey; >>>>> DECR_CQCOUNT(&r_xprt->rx_ep); >>> Ditto. >>> >>>>> + if (!ia->ri_id->qp) >>>>> + return -EINVAL; >>>>> + >>>>> rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr); >>>>> if (rc) >>>>> dprintk("RPC: %s: failed ib_post_send for invalidate," >>>>> @@ -1923,6 +1938,9 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia, >>>>> send_wr.send_flags = IB_SEND_SIGNALED; >>>>> } >>> Ditto. >>> >>>>> + if (!ia->ri_id->qp) >>>>> + return -EINVAL; >>>>> + >>>>> rc = ib_post_send(ia->ri_id->qp, &send_wr, &send_wr_fail); >>>>> if (rc) >>>>> dprintk("RPC: %s: ib_post_send returned %i\n", __func__, >>>>> @@ -1951,6 +1969,9 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia, >>>>> rep->rr_iov.addr, rep->rr_iov.length, DMA_BIDIRECTIONAL); >>>>> >>>>> DECR_CQCOUNT(ep); >>> And here. >>> >>>>> + >>>>> + if (!ia->ri_id->qp) >>>>> + return -EINVAL; >>>>> rc = ib_post_recv(ia->ri_id->qp, &recv_wr, &recv_wr_fail); >>>>> >>>>> if (rc) >>>>> -- >>>>> 1.7.1 >>>>> >>>> _________________________________ >>>> Trond Myklebust >>>> Linux NFS client maintainer, PrimaryData >>>> trond.myklebust@xxxxxxxxxxxxxxx >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo >>>> info at http://vger.kernel.org/majordomo-info.html >>> -- >>> Chuck Lever >>> chuck[dot]lever[at]oracle[dot]com >>> >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> -- >> Chuck Lever >> chuck[dot]lever[at]oracle[dot]com >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html