On Jun 12, 2023 / 11:18, Jason Gunthorpe wrote: > On Mon, Jun 12, 2023 at 02:42:37PM +0900, Shin'ichiro Kawasaki wrote: > > When rdma_destroy_id() and cma_iw_handler() race, struct rdma_id_private > > *id_priv can be destroyed during cma_iw_handler call. This causes "BUG: > > KASAN: slab-use-after-free" at mutex_lock() in cma_iw_handler() [1]. > > To prevent the destroy of id_priv, keep its reference count by calling > > cma_id_get() and cma_id_put() at start and end of cma_iw_handler(). > > > > [1] > > > > ================================================================== > > BUG: KASAN: slab-use-after-free in __mutex_lock+0x1324/0x18f0 > > Read of size 8 at addr ffff888197b37418 by task kworker/u8:0/9 > > > > CPU: 0 PID: 9 Comm: kworker/u8:0 Not tainted 6.3.0 #62 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 > > Workqueue: iw_cm_wq cm_work_handler [iw_cm] > > Call Trace: > > <TASK> > > dump_stack_lvl+0x57/0x90 > > print_report+0xcf/0x660 > > ? __mutex_lock+0x1324/0x18f0 > > kasan_report+0xa4/0xe0 > > ? __mutex_lock+0x1324/0x18f0 > > __mutex_lock+0x1324/0x18f0 > > ? cma_iw_handler+0xac/0x4f0 [rdma_cm] > > ? _raw_spin_unlock_irqrestore+0x30/0x60 > > ? rcu_is_watching+0x11/0xb0 > > ? _raw_spin_unlock_irqrestore+0x30/0x60 > > ? trace_hardirqs_on+0x12/0x100 > > ? __pfx___mutex_lock+0x10/0x10 > > ? __percpu_counter_sum+0x147/0x1e0 > > ? domain_dirty_limits+0x246/0x390 > > ? wb_over_bg_thresh+0x4d5/0x610 > > ? rcu_is_watching+0x11/0xb0 > > ? cma_iw_handler+0xac/0x4f0 [rdma_cm] > > cma_iw_handler+0xac/0x4f0 [rdma_cm] > > What is the full call chain here, eg with the static functions > un-inlined? I checked the inlined func call chain from cm_work_handler to cma_iw_handler (I recreated the symptom using kernel v6.4-rc5, so, address numbers are different): $ ./scripts/faddr2line ./drivers/infiniband/core/iw_cm.ko cm_work_handler+0xb57/0x1c50 cm_work_handler+0xb57/0x1c50: cm_close_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:974 (inlined by) process_event at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:997 (inlined by) cm_work_handler at /home/shin/Linux/linux/drivers/infiniband/core/iwcm.c:1036 With this, my understanding of the full call chain from NVME driver to cma_iw_handler is as follows, including task switch to cm_work_handler: nvme_rdma_teardown_io_queue nvme_rdma_stop_io_queues nvme_rdma_stop_queue __nvme_rdma_stop_queue rdma_disconnect iw_cm_disconnect iwcm_modify_qp_sqd ib_modify_qp _ib_modify_qp ib_security_modify_qp siw_verbs_modify_qp siw_qp_modify siw_qp_cm_drop siw_cm_upcall(IW_CM_EVENT_CLOSE) cm_event_handler -> refcount_inc(&cm_id_priv->refoucnt) queue_work -> cm_work_handler process_event cm_close_handler cm_work_handler cma_iw_handler > > > > drivers/infiniband/core/cma.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c > > index 93a1c48d0c32..c5267d9bb184 100644 > > --- a/drivers/infiniband/core/cma.c > > +++ b/drivers/infiniband/core/cma.c > > @@ -2477,6 +2477,7 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > > struct sockaddr *laddr = (struct sockaddr *)&iw_event->local_addr; > > struct sockaddr *raddr = (struct sockaddr *)&iw_event->remote_addr; > > > > + cma_id_get(id_priv); > > mutex_lock(&id_priv->handler_mutex); > > if (READ_ONCE(id_priv->state) != RDMA_CM_CONNECT) > > goto out; > > @@ -2524,12 +2525,14 @@ static int cma_iw_handler(struct iw_cm_id *iw_id, struct iw_cm_event *iw_event) > > if (ret) { > > /* Destroy the CM ID by returning a non-zero value. */ > > id_priv->cm_id.iw = NULL; > > + cma_id_put(id_priv); > > destroy_id_handler_unlock(id_priv); > > return ret; > > } > > > > out: > > mutex_unlock(&id_priv->handler_mutex); > > + cma_id_put(id_priv); > > return ret; > > } > > cm_work_handler already has a ref on the iwcm_id_private > > I think there is likely some much larger issue with the IW CM if the > cm_id can be destroyed while the iwcm_id is in use? It is weird that > there are two id memories for this :\ My understanding about the call chain to rdma id destroy is as follows. I guess _destory_id calls iw_destory_cm_id before destroying the rdma id, but not sure why it does not wait for cm_id deref by cm_work_handler. nvme_rdma_teardown_io_queueus nvme_rdma_stop_io_queues -> chained to cma_iw_handler nvme_rdma_free_io_queues nvme_rdma_free_queue rdma_destroy_id mutex_lock(&id_priv->handler_mutex) destroy_id_handler_unlock mutex_unlock(&id_priv->handler_mutex) _destory_id iw_destroy_cm_id wait_for_completiion(&id_priv->comp) kfree(id_priv)