On Tue, Jun 13, 2023 at 10:30:37AM -0300, Jason Gunthorpe wrote: > On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > > I think there is likely some much larger issue with the IW CM if the > > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > > there are two id memories for this :\ > > > > My understanding about the call chain to rdma id destroy is as follows. I guess > > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > > why it does not wait for cm_id deref by cm_work_handler. > > > > nvme_rdma_teardown_io_queueus > > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > > nvme_rdma_free_io_queues > > nvme_rdma_free_queue > > rdma_destroy_id > > mutex_lock(&id_priv->handler_mutex) > > destroy_id_handler_unlock > > mutex_unlock(&id_priv->handler_mutex) > > _destory_id > > iw_destroy_cm_id > > wait_for_completiion(&id_priv->comp) > > kfree(id_priv) > > Once a destroy_cm_id() has returned that layer is no longer > permitted to run or be running in its handlers. The iw cm is broken if > it allows this, and that is the cause of the bug. > > Taking more refs within handlers that are already not allowed to be > running is just racy. So we need to revert that patch from our rdma-rc. Thanks > > Jason