> On Mar 19, 2018, at 1:26 PM, Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx> wrote: > >> From: Kalderon, Michal >> Sent: Monday, March 19, 2018 7:22 PM >> To: 'Chuck Lever' <chuck.lever@xxxxxxxxxx> >> Cc: linux-rdma@xxxxxxxxxxxxxxx >> Subject: RE: [PATCH v1] xprtrdma: Fix corner cases when handling device >> removal >> >>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] >>> Sent: Monday, March 19, 2018 4:59 PM >>> To: Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx> >>> Cc: linux-rdma@xxxxxxxxxxxxxxx >>> Subject: Re: [PATCH v1] xprtrdma: Fix corner cases when handling >>> device removal >>> >>> Hi Michal- >>> >>> Have you tried this one? If we can nail this down now, I can easily >>> get it into the v4.17 merge window. >> >> Yes, looks good thanks. > > Chuck, while testing changes, I ran into a different issue though. > If I perform mount to a server that does not have the nfs rdma port in portlist using mount -o rdma, > And then perform rmmod qedr, it hangs with following stack : > > root@lb-tlvb-pcie37 ~]# cat /proc/5981/stack > [<0>] rpcrdma_conn_upcall+0x26a/0x310 [rpcrdma] > [<0>] cma_remove_one+0x26d/0x2a0 [rdma_cm] > [<0>] ib_unregister_device+0xcc/0x180 [ib_core] > [<0>] qedr_remove+0x37/0x80 [qedr] > [<0>] qede_rdma_unregister_driver+0x4b/0x90 [qede] > [<0>] SyS_delete_module+0x1c2/0x240 > [<0>] do_syscall_64+0x6e/0x190 > [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > (wait_for_completion(&ia->ri_remove_done) I will try to reproduce it here. > Thanks, > Michal >>> >>> >>>> On Mar 14, 2018, at 10:42 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> >>> wrote: >>>> >>>> Michal Kalderon has found some corner cases around device unload >>>> with active NFS mounts that I didn't have the imagination to test >>>> when xprtrdma device removal was added last year. >>>> >>>> - The ULP device removal handler is responsible for deallocating >>>> the PD. That wasn't clear to me initially, and my own testing >>>> suggested it was not necessary, but that is incorrect. >>>> >>>> - The transport destruction path can no longer assume that there is >>>> a valid ID. >>>> >>>> - When destroying a transport, ensure that ib_free_cq() is not >>>> invoked on a CQ that was already released. >>>> >>>> Reported-by: Michal Kalderon <Michal.Kalderon@xxxxxxxxxx> >>>> Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from ...") >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> Cc: stable@xxxxxxxxxxxxxxx >>>> --- >>>> net/sunrpc/xprtrdma/verbs.c | 13 +++++++++---- >>>> 1 file changed, 9 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/net/sunrpc/xprtrdma/verbs.c >>>> b/net/sunrpc/xprtrdma/verbs.c index d19ea02..c284ee7 100644 >>>> --- a/net/sunrpc/xprtrdma/verbs.c >>>> +++ b/net/sunrpc/xprtrdma/verbs.c >>>> @@ -251,7 +251,6 @@ >>>> wait_for_completion(&ia->ri_remove_done); >>>> >>>> ia->ri_id = NULL; >>>> - ia->ri_pd = NULL; >>>> ia->ri_device = NULL; >>>> /* Return 1 to ensure the core destroys the id. */ >>>> return 1; >>>> @@ -449,7 +448,9 @@ >>>> ia->ri_id->qp = NULL; >>>> } >>>> ib_free_cq(ep->rep_attr.recv_cq); >>>> + ep->rep_attr.recv_cq = NULL; >>>> ib_free_cq(ep->rep_attr.send_cq); >>>> + ep->rep_attr.send_cq = NULL; >>>> >>>> /* The ULP is responsible for ensuring all DMA >>>> * mappings and MRs are gone. >>>> @@ -462,6 +463,8 @@ >>>> rpcrdma_dma_unmap_regbuf(req->rl_recvbuf); >>>> } >>>> rpcrdma_mrs_destroy(buf); >>>> + ib_dealloc_pd(ia->ri_pd); >>>> + ia->ri_pd = NULL; >>>> >>>> /* Allow waiters to continue */ >>>> complete(&ia->ri_remove_done); >>>> @@ -629,14 +632,16 @@ >>>> { >>>> cancel_delayed_work_sync(&ep->rep_connect_worker); >>>> >>>> - if (ia->ri_id->qp) { >>>> + if (ia->ri_id && ia->ri_id->qp) { >>>> rpcrdma_ep_disconnect(ep, ia); >>>> rdma_destroy_qp(ia->ri_id); >>>> ia->ri_id->qp = NULL; >>>> } >>>> >>>> - ib_free_cq(ep->rep_attr.recv_cq); >>>> - ib_free_cq(ep->rep_attr.send_cq); >>>> + if (ep->rep_attr.recv_cq) >>>> + ib_free_cq(ep->rep_attr.recv_cq); >>>> + if (ep->rep_attr.send_cq) >>>> + ib_free_cq(ep->rep_attr.send_cq); >>>> } >>>> >>>> /* Re-establish a connection after a device removal event. >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" >>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More >>> majordomo >>>> info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> Chuck Lever >>> >>> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html