> And I don't see anything preventing that from running concurrently > with the WQ Thanks, i made a fix. diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index de5ab2ae8e1738455bc36eca5487f1c9136e060e..6b0fc8d54f49102ad8c9fe67b1888dc6883f2164 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -601,14 +601,18 @@ static void isert_np_reinit_id_work(struct work_struct *w) { struct isert_np *isert_np = container_of(w, struct isert_np, work); - rdma_destroy_id(isert_np->cm_id); + mutex_lock(&isert_np->id_mutex); + if (isert_np->cm_id) { + rdma_destroy_id(isert_np->cm_id); - isert_np->cm_id = isert_setup_id(isert_np); - if (IS_ERR(isert_np->cm_id)) { - isert_err("isert np %p setup id failed: %ld\n", - isert_np, PTR_ERR(isert_np->cm_id)); - isert_np->cm_id = NULL; + isert_np->cm_id = isert_setup_id(isert_np); + if (IS_ERR(isert_np->cm_id)) { + isert_err("isert np %p setup id failed: %ld\n", + isert_np, PTR_ERR(isert_np->cm_id)); + isert_np->cm_id = NULL; + } } + mutex_unlock(&isert_np->id_mutex); } static int @@ -2292,6 +2296,7 @@ isert_setup_np(struct iscsi_np *np, } INIT_WORK(&isert_np->work, isert_np_reinit_id_work); + mutex_init(&isert_np->id_mutex); sema_init(&isert_np->sem, 0); mutex_init(&isert_np->mutex); @@ -2455,8 +2460,12 @@ isert_free_np(struct iscsi_np *np) struct isert_np *isert_np = np->np_context; struct isert_conn *isert_conn, *n; - if (isert_np->cm_id) + mutex_lock(&isert_np->id_mutex); + if (isert_np->cm_id) { rdma_destroy_id(isert_np->cm_id); + isert_np->cm_id = NULL; + } + mutex_unlock(&isert_np->id_mutex); /* * FIXME: At this point we don't have a good way to insure diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h index 5fdc799f3ca864667a454374f8548e7f031e9925..dafe7b44494d1d5687f0d87e18855d78e8729707 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.h +++ b/drivers/infiniband/ulp/isert/ib_isert.h @@ -211,4 +211,5 @@ struct isert_np { struct list_head pending; struct work_struct work; struct workqueue_struct *reinit_id_wq; + struct mutex id_mutex; }; > What is this trying to do anyhow? If the addr has truely changed why > does the bind fail? When the active physical link member of bonding interface in active-standby mode gets faulty, the standby link will represent the assigned addresses on behalf of the active link. Therefore, RDMA communication manager will notify iSER target with RDMA_CM_EVENT_ADDR_CHANGE. The iSCSI socket address does not change. But the cma_id at the IB layer, which is bound to the iSCSI socket, will change. The problem is that the new cma_id is trying to bind to a socket address that is still bound to the old cma_id. That is, before you bind a new cma_id to a socket, you must first delete the old one. Best regards, Chesnokov Gleb From: Jason Gunthorpe <jgg@xxxxxxxxxx> Sent: Monday, July 19, 2021 3:13:02 PM To: lanevdenoche@xxxxxxxxx Cc: linux-rdma@xxxxxxxxxxxxxxx; sagi@xxxxxxxxxxx; dledford@xxxxxxxxxx; Chesnokov Gleb Subject: Re: [PATCH 1/1] iser-target: Fix handling of RDMA_CV_EVENT_ADDR_CHANGE On Wed, Jul 14, 2021 at 09:26:46PM +0300, lanevdenoche@xxxxxxxxx wrote: > @@ -2466,6 +2489,8 @@ isert_free_np(struct iscsi_np *np) > } > mutex_unlock(&isert_np->mutex); > > + destroy_workqueue(isert_np->reinit_id_wq); > + This is racy, the lines above have: if (isert_np->cm_id) rdma_destroy_id(isert_np->cm_id); And I don't see anything preventing that from running concurrently with the WQ What is this trying to do anyhow? If the addr has truely changed why does the bind fail? Jason