On Jun 13, 2023 / 21:07, Leon Romanovsky wrote: > On Tue, Jun 13, 2023 at 10:30:37AM -0300, Jason Gunthorpe wrote: > > On Tue, Jun 13, 2023 at 01:43:43AM +0000, Shinichiro Kawasaki wrote: > > > > I think there is likely some much larger issue with the IW CM if the > > > > cm_id can be destroyed while the iwcm_id is in use? It is weird that > > > > there are two id memories for this :\ > > > > > > My understanding about the call chain to rdma id destroy is as follows. I guess > > > _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure > > > why it does not wait for cm_id deref by cm_work_handler. > > > > > > nvme_rdma_teardown_io_queueus > > > nvme_rdma_stop_io_queues -> chained to cma_iw_handler > > > nvme_rdma_free_io_queues > > > nvme_rdma_free_queue > > > rdma_destroy_id > > > mutex_lock(&id_priv->handler_mutex) > > > destroy_id_handler_unlock > > > mutex_unlock(&id_priv->handler_mutex) > > > _destory_id > > > iw_destroy_cm_id > > > wait_for_completiion(&id_priv->comp) > > > kfree(id_priv) > > > > Once a destroy_cm_id() has returned that layer is no longer > > permitted to run or be running in its handlers. The iw cm is broken if > > it allows this, and that is the cause of the bug. > > > > Taking more refs within handlers that are already not allowed to be > > running is just racy. > > So we need to revert that patch from our rdma-rc. I see, thanks for the clarifications. As another fix approach, I reverted the commit 59c68ac31e15 ("iw_cm: free cm_id resources on the last deref") so that iw_destroy_cm_id() waits for deref of cm_id. With that revert, the KASAN slab-use-after-free disappeared. Is this the right fix approach?