On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > I think there is likely some much larger issue with the IW CM if the > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > there are two id memories for this :\ > > My understanding about the call chain to rdma id destroy is as follows. I guess > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > why it does not wait for cm_id deref by cm_work_handler. > > nvme_rdma_teardown_io_queueus > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > nvme_rdma_free_io_queues > nvme_rdma_free_queue > rdma_destroy_id > mutex_lock(&id_priv->handler_mutex) > destroy_id_handler_unlock > mutex_unlock(&id_priv->handler_mutex) > _destory_id > iw_destroy_cm_id > wait_for_completiion(&id_priv->comp) > kfree(id_priv) Once a destroy_cm_id() has returned that layer is no longer permitted to run or be running in its handlers. The iw cm is broken if it allows this, and that is the cause of the bug. Taking more refs within handlers that are already not allowed to be running is just racy. Jason