On 4/11/23 09:33, Zhijian Li (Fujitsu) wrote:
diff --git a/drivers/infiniband/ulp/rtrs/rtrs-clt.c b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
index 4c8f42e46e2f..760a7eb51297 100644
--- a/drivers/infiniband/ulp/rtrs/rtrs-clt.c
+++ b/drivers/infiniband/ulp/rtrs/rtrs-clt.c
@@ -2074,6 +2074,7 @@ static int create_cm(struct rtrs_clt_con *con)
rtrs_err(s, "Failed to resolve address, err: %d\n", err);
goto destroy_cm;
}
+again:
/*
* Combine connection status and session events. This is needed
* for waiting two possible cases: cm_err has something meaningful
@@ -2083,10 +2084,15 @@ static int create_cm(struct rtrs_clt_con *con)
clt_path->state_wq,
con->cm_err || clt_path->state != RTRS_CLT_CONNECTING,
msecs_to_jiffies(RTRS_CONNECT_TIMEOUT_MS));
- if (err == 0 || err == -ERESTARTSYS) {
- if (err == 0)
- err = -ETIMEDOUT;
- /* Timedout or interrupted */
+ if (err == -ERESTARTSYS) {
+ /* interrupted,
+ * try again to avoid the in-flight rtrs_clt_rdma_cm_handler()
+ * getting a use-after-free
+ */
+ goto again;
+ } else if (err == 0) {
+ err = -ETIMEDOUT;
+ /* Timedout */
goto errr;
}
Can event handler still be triggered in case of timeout?
I have never hit such race.
But it is still possible with the theory.
And I guess either stop_cm -> rdma_disconnect or destroy_cm -> rdma_destroy_id
should prevent this kind of racy issue.
In practice, they are possible that rtrs_clt_rdma_cm_handler() is in-flight during
'either stop_cm -> rdma_disconnect or destroy_cm -> rdma_destroy_id'. rtrs_clt_rdma_cm_handler() and
cm's cleanup path need to hold mutex_lock(&con->con_mutex), once cm's cleanup path get this lock first
rtrs_clt_rdma_cm_handler has to sleep, when rtrs_clt_rdma_cm_handler is wakeup again, some resources has been
freed by cm's cleanup path.
First, stop_cm doesn't need to hold &con->con_mutex. But
rtrs_clt_rdma_cm_handler
is called from rdma core layer which need id_priv->handler_mutexinstead
of con_mutex
I thinnk. Also RTRS has similar behavior as nvme host rdma, ib_srp and
iser_verb.
@Jinpu/Haris, can we move destroy_cm right after stop_cm?*
*Thanks,
Guoqing