On Tuesday, September 09/17/19, 2019 at 22:50:27 +0530, Sagi Grimberg wrote: > > >> To: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> From: "Jason Gunthorpe" <jgg@xxxxxxxx> > >> Date: 09/16/2019 06:28PM > >> Cc: "Steve Wise" <larrystevenwise@xxxxxxxxx>, "Bernard Metzler" > >> <BMT@xxxxxxxxxxxxxx>, "Sagi Grimberg" <sagi@xxxxxxxxxxx>, > >> "linux-rdma@xxxxxxxxxxxxxxx" <linux-rdma@xxxxxxxxxxxxxxx> > >> Subject: [EXTERNAL] Re: Re: [PATCH v3] iwcm: don't hold the irq > >> disabled lock on iw_rem_ref > >> > >> On Wed, Sep 11, 2019 at 09:28:16PM +0530, Krishnamraju Eraparaju > >> wrote: > >>> Hi Steve & Bernard, > >>> > >>> Thanks for the review comments. > >>> I will do those formating changes. > >> > >> I don't see anything in patchworks, but the consensus is to drop > >> Sagi's patch pending this future patch? > >> > >> Jason > >> > > This is my impression as well. But consensus should be > > explicit...Sagi, what do you think? > > I don't really care, but given the changes from Krishnamraju cause other > problems I'd ask if my version is also offending his test. Hi Sagi, Your version holds good for rdma_destroy_id() path only, but there are other places where iw_rem_ref() is being called within spinlocks, here is the call trace when iw_rem_ref() is called in cm_close_handler() path: [ 627.716339] Call Trace: [ 627.716339] ? load_new_mm_cr3+0xc0/0xc0 [ 627.716339] on_each_cpu+0x23/0x40 [ 627.716339] flush_tlb_kernel_range+0x74/0x80 [ 627.716340] free_unmap_vmap_area+0x2a/0x40 [ 627.716340] remove_vm_area+0x59/0x60 [ 627.716340] __vunmap+0x46/0x210 [ 627.716340] siw_free_qp+0x88/0x120 [siw] [ 627.716340] cm_work_handler+0x5b8/0xd00 <===== [ 627.716340] process_one_work+0x155/0x380 [ 627.716341] worker_thread+0x41/0x3b0 [ 627.716341] kthread+0xf3/0x130 [ 627.716341] ? process_one_work+0x380/0x380 [ 627.716341] ? kthread_bind+0x10/0x10 [ 627.716341] ret_from_fork+0x35/0x40 Hence, I moved all the occurances of iw_rem_ref() outside of spinlocks critical section. > > In general, I do not think that making resources free routines (both > explict or implicit via ref dec) under a spinlock is not a robust > design. > > I would first make it clear and documented what cm_id_priv->lock is > protecting. In my mind, it should protect *its own* mutations of > cm_id_priv and by design leave all the ops calls outside the lock. > > I don't understand what is causing the Chelsio issue observed, but > it looks like c4iw_destroy_qp blocks on a completion that depends on a > refcount that is taken also by iwcm, which means that I cannot call > ib_destroy_qp if the cm is not destroyed as well? > > If that is madatory, I'd say that instead of blocking on this > completion, we can simply convert c4iw_qp_rem_ref if use a kref > which is not order dependent. I agree, I'm exploring best possible ways to address this issue, will update my final resolution by this Friday.