On 2024/11/23 00:03, Alexandra Winter wrote:
On 22.11.24 08:16, Wen Gu wrote:
We encountered a LGR/link use-after-free issue, which manifested as
the LGR/link refcnt reaching 0 early and entering the clear process,
making resource access unsafe.
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
Workqueue: events smc_lgr_terminate_work [smc]
Call trace:
refcount_warn_saturate+0x9c/0x140
__smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
smc_lgr_terminate_work+0x28/0x30 [smc]
process_one_work+0x1b8/0x420
worker_thread+0x158/0x510
kthread+0x114/0x118
or
refcount_t: underflow; use-after-free.
WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
Workqueue: smc_hs_wq smc_listen_work [smc]
Call trace:
refcount_warn_saturate+0xf0/0x140
smcr_link_put+0x1cc/0x1d8 [smc]
smc_conn_free+0x110/0x1b0 [smc]
smc_conn_abort+0x50/0x60 [smc]
smc_listen_find_device+0x75c/0x790 [smc]
smc_listen_work+0x368/0x8a0 [smc]
process_one_work+0x1b8/0x420
worker_thread+0x158/0x510
kthread+0x114/0x118
It is caused by repeated release of LGR/link refcnt. One suspect is that
smc_conn_free() is called repeatedly because some smc_conn_free() are not
protected by sock lock.
Calls under socklock | Calls not under socklock
-------------------------------------------------------
lock_sock(sk) | smc_conn_abort
smc_conn_free | \- smc_conn_free
\- smcr_link_put | \- smcr_link_put (duplicated)
release_sock(sk)
So make sure smc_conn_free() is called under the sock lock.
Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
Co-developed-by: Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx>
Signed-off-by: Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx>
Co-developed-by: Kai <KaiShen@xxxxxxxxxxxxxxxxx>
Signed-off-by: Kai <KaiShen@xxxxxxxxxxxxxxxxx>
Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
---
net/smc/af_smc.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index ed6d4d520bc7..e0a7a0151b11 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
return smc_connect_fallback(smc, reason_code);
}
-static void smc_conn_abort(struct smc_sock *smc, int local_first)
+static void __smc_conn_abort(struct smc_sock *smc, int local_first,
+ bool locked)
{
struct smc_connection *conn = &smc->conn;
struct smc_link_group *lgr = conn->lgr;
@@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
if (smc_conn_lgr_valid(conn))
lgr_valid = true;
- smc_conn_free(conn);
+ if (!locked) {
+ lock_sock(&smc->sk);
+ smc_conn_free(conn);
+ release_sock(&smc->sk);
+ } else {
+ smc_conn_free(conn);
+ }
if (local_first && lgr_valid)
smc_lgr_cleanup_early(lgr);
}
+static void smc_conn_abort(struct smc_sock *smc, int local_first)
+{
+ __smc_conn_abort(smc, local_first, false);
+}
+
+static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
+{
+ __smc_conn_abort(smc, local_first, true);
+}
+
/* check if there is a rdma device available for this connection. */
/* called for connect and listen */
static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
@@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
return 0;
connect_abort:
- smc_conn_abort(smc, ini->first_contact_local);
+ smc_conn_abort_locked(smc, ini->first_contact_local);
mutex_unlock(&smc_client_lgr_pending);
smc->connect_nonblock = 0;
@@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc,
return 0;
connect_abort:
- smc_conn_abort(smc, ini->first_contact_local);
+ smc_conn_abort_locked(smc, ini->first_contact_local);
mutex_unlock(&smc_server_lgr_pending);
smc->connect_nonblock = 0;
I wonder if this can deadlock, when you take lock_sock so far down in the callchain.
example:
smc_connect will first take lock_sock(sk) and then mutex_lock(&smc_server_lgr_pending); (e.g. in smc_connect_ism())
wheras
smc_listen_work() will take mutex_lock(&smc_server_lgr_pending); and then lock_sock(sk) (in your __smc_conn_abort(,,false))
I am not sure whether this can be called on the same socket, but it looks suspicious to me.
IMHO this two paths can not occur on the same sk.
All callers of smc_conn_abort() without socklock seem to originate from smc_listen_work().
That makes me think whether smc_listen_work() should do lock_sock(sk) on a higher level.
Yes, I also think about this question, I guess it is because the new smc sock will be
accepted by userspace only after smc_listen_work() is completed. Before that, no userspace
operation occurs synchronously with it, so it is not protected by sock lock. But I am not
sure if there are other reasons, so I did not aggressively protect the entire smc_listen_work
with sock lock, but chose a conservative approach.
Do you have an example which function could collide with smc_listen_work()?
i.e. have you found a way to reproduce this?
We discovered this during our fault injection testing where the rdma driver was rmmod/insmod
sporadically during the nginx/wrk 1K connections test.
e.g.
__smc_lgr_terminate | smc_listen_decline
(caused by rmmod mlx5_ib) | (caused by e.g. reg mr fail)
--------------------------------------------------------------
lock_sock |
smc_conn_kill | smc_conn_abort
\- smc_conn_free | \- smc_conn_free
release_sock |
Are you sure that all callers of smc_conn_free(), that are not smc_conn_abort(), do set the socklock?
It seems to me that the path of smc_conn_kill() is not covered by your solution.
smc_conn_free is called in these places:
1. __smc_release (protected by sock lock)
2. smc_conn_abort (partially protected by sock lock)
3. smc_close_active_abort - smc_release(protected by sock lock)
- smc_conn_kill - __smc_lgr_terminate/smc_conn_abort_work(protected by sock lock)
4. smc_close_passive_work (protected by sock lock)
So only smc_conn_abort->smc_conn_free is not well protected by sock lock.
Please excuse, that I am not deeply familiar with this code.
I'm just trying to ask helpful questions.
Thanks! :)