> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] > Sent: Tuesday, March 20, 2018 2:40 AM > > On Mar 19, 2018, at 2:06 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> > wrote: > >> On Mar 19, 2018, at 1:26 PM, Kalderon, Michal > <Michal.Kalderon@xxxxxxxxxx> wrote: > >> > >>> From: Kalderon, Michal > >>> Sent: Monday, March 19, 2018 7:22 PM > >>> To: 'Chuck Lever' <chuck.lever@xxxxxxxxxx> > >>> Cc: linux-rdma@xxxxxxxxxxxxxxx > >>> Subject: RE: [PATCH v1] xprtrdma: Fix corner cases when handling > >>> device removal > >>> > >>>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx] > >>>> Sent: Monday, March 19, 2018 4:59 PM > >>>> To: Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx> > >>>> Cc: linux-rdma@xxxxxxxxxxxxxxx > >>>> Subject: Re: [PATCH v1] xprtrdma: Fix corner cases when handling > >>>> device removal > >>>> > >>>> Hi Michal- > >>>> > >>>> Have you tried this one? If we can nail this down now, I can easily > >>>> get it into the v4.17 merge window. > >>> > >>> Yes, looks good thanks. > >> > >> Chuck, while testing changes, I ran into a different issue though. > >> If I perform mount to a server that does not have the nfs rdma port > >> in portlist using mount -o rdma, And then perform rmmod qedr, it hangs > with following stack : > >> > >> root@lb-tlvb-pcie37 ~]# cat /proc/5981/stack [<0>] > >> rpcrdma_conn_upcall+0x26a/0x310 [rpcrdma] [<0>] > >> cma_remove_one+0x26d/0x2a0 [rdma_cm] [<0>] > >> ib_unregister_device+0xcc/0x180 [ib_core] [<0>] > qedr_remove+0x37/0x80 > >> [qedr] [<0>] qede_rdma_unregister_driver+0x4b/0x90 [qede] [<0>] > >> SyS_delete_module+0x1c2/0x240 [<0>] do_syscall_64+0x6e/0x190 [<0>] > >> entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > >> > >> (wait_for_completion(&ia->ri_remove_done) > > > > I will try to reproduce it here. > > I can't reproduce with my IB or my RoCE rig, but I might not have understood > how you are getting into this state. I'm trying this: > > # mount.nfs -v -o vers=4.0,proto=rdma,port=12345,sec=sys nfs- > roce:/export/tmp /mnt I killed the mount attempt with "Ctrl-C" and performed the rmmod from the same window > > and while it is looping trying to connect, I do this in another > window: > > # rmmod mlx4_en > > On it's next attempt to connect, the connect worker fails with > RDMA_CM_EVENT_ADDR_ERROR. I don't ever see > RDMA_CM_EVENT_DEVICE_REMOVAL. > > > >> Thanks, > >> Michal > >>>> > >>>> > >>>>> On Mar 14, 2018, at 10:42 AM, Chuck Lever <chuck.lever@xxxxxxxxxx> > >>>> wrote: > >>>>> > >>>>> Michal Kalderon has found some corner cases around device unload > >>>>> with active NFS mounts that I didn't have the imagination to test > >>>>> when xprtrdma device removal was added last year. > >>>>> > >>>>> - The ULP device removal handler is responsible for deallocating > >>>>> the PD. That wasn't clear to me initially, and my own testing > >>>>> suggested it was not necessary, but that is incorrect. > >>>>> > >>>>> - The transport destruction path can no longer assume that there > >>>>> is a valid ID. > >>>>> > >>>>> - When destroying a transport, ensure that ib_free_cq() is not > >>>>> invoked on a CQ that was already released. > >>>>> > >>>>> Reported-by: Michal Kalderon <Michal.Kalderon@xxxxxxxxxx> > >>>>> Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from > >>>>> ...") > >>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > >>>>> Cc: stable@xxxxxxxxxxxxxxx > >>>>> --- > >>>>> net/sunrpc/xprtrdma/verbs.c | 13 +++++++++---- > >>>>> 1 file changed, 9 insertions(+), 4 deletions(-) > >>>>> > >>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c > >>>>> b/net/sunrpc/xprtrdma/verbs.c index d19ea02..c284ee7 100644 > >>>>> --- a/net/sunrpc/xprtrdma/verbs.c > >>>>> +++ b/net/sunrpc/xprtrdma/verbs.c > >>>>> @@ -251,7 +251,6 @@ > >>>>> wait_for_completion(&ia->ri_remove_done); > >>>>> > >>>>> ia->ri_id = NULL; > >>>>> - ia->ri_pd = NULL; > >>>>> ia->ri_device = NULL; > >>>>> /* Return 1 to ensure the core destroys the id. */ > >>>>> return 1; > >>>>> @@ -449,7 +448,9 @@ > >>>>> ia->ri_id->qp = NULL; > >>>>> } > >>>>> ib_free_cq(ep->rep_attr.recv_cq); > >>>>> + ep->rep_attr.recv_cq = NULL; > >>>>> ib_free_cq(ep->rep_attr.send_cq); > >>>>> + ep->rep_attr.send_cq = NULL; > >>>>> > >>>>> /* The ULP is responsible for ensuring all DMA > >>>>> * mappings and MRs are gone. > >>>>> @@ -462,6 +463,8 @@ > >>>>> rpcrdma_dma_unmap_regbuf(req->rl_recvbuf); > >>>>> } > >>>>> rpcrdma_mrs_destroy(buf); > >>>>> + ib_dealloc_pd(ia->ri_pd); > >>>>> + ia->ri_pd = NULL; > >>>>> > >>>>> /* Allow waiters to continue */ > >>>>> complete(&ia->ri_remove_done); > >>>>> @@ -629,14 +632,16 @@ > >>>>> { > >>>>> cancel_delayed_work_sync(&ep->rep_connect_worker); > >>>>> > >>>>> - if (ia->ri_id->qp) { > >>>>> + if (ia->ri_id && ia->ri_id->qp) { > >>>>> rpcrdma_ep_disconnect(ep, ia); > >>>>> rdma_destroy_qp(ia->ri_id); > >>>>> ia->ri_id->qp = NULL; > >>>>> } > >>>>> > >>>>> - ib_free_cq(ep->rep_attr.recv_cq); > >>>>> - ib_free_cq(ep->rep_attr.send_cq); > >>>>> + if (ep->rep_attr.recv_cq) > >>>>> + ib_free_cq(ep->rep_attr.recv_cq); > >>>>> + if (ep->rep_attr.send_cq) > >>>>> + ib_free_cq(ep->rep_attr.send_cq); > >>>>> } > >>>>> > >>>>> /* Re-establish a connection after a device removal event. > >>>>> > >>>>> -- > >>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More > >>>> majordomo > >>>>> info at http://vger.kernel.org/majordomo-info.html > >>>> > >>>> -- > >>>> Chuck Lever > >>>> > >>>> > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > >> info at http://vger.kernel.org/majordomo-info.html > > > > -- > > Chuck Lever > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html