Re: [PATCH net 2/2] net/smc: fix LGR and link use-after-free issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 22.11.24 08:16, Wen Gu wrote:
We encountered a LGR/link use-after-free issue, which manifested as
the LGR/link refcnt reaching 0 early and entering the clear process,
making resource access unsafe.


How did you make sure that the refcount mentioned in the warning are the LGR/link refcnt, not &sk->sk_refcnt?

  refcount_t: addition on 0; use-after-free.
  WARNING: CPU: 14 PID: 107447 at lib/refcount.c:25 refcount_warn_saturate+0x9c/0x140
  Workqueue: events smc_lgr_terminate_work [smc]
  Call trace:
   refcount_warn_saturate+0x9c/0x140
   __smc_lgr_terminate.part.45+0x2a8/0x370 [smc]
   smc_lgr_terminate_work+0x28/0x30 [smc]
   process_one_work+0x1b8/0x420
   worker_thread+0x158/0x510
   kthread+0x114/0x118

or

  refcount_t: underflow; use-after-free.
  WARNING: CPU: 6 PID: 93140 at lib/refcount.c:28 refcount_warn_saturate+0xf0/0x140
  Workqueue: smc_hs_wq smc_listen_work [smc]
  Call trace:
   refcount_warn_saturate+0xf0/0x140
   smcr_link_put+0x1cc/0x1d8 [smc]
   smc_conn_free+0x110/0x1b0 [smc]
   smc_conn_abort+0x50/0x60 [smc]
   smc_listen_find_device+0x75c/0x790 [smc]
   smc_listen_work+0x368/0x8a0 [smc]
   process_one_work+0x1b8/0x420
   worker_thread+0x158/0x510
   kthread+0x114/0x118

It is caused by repeated release of LGR/link refcnt. One suspect is that
smc_conn_free() is called repeatedly because some smc_conn_free() are not
protected by sock lock.

Calls under socklock        | Calls not under socklock
-------------------------------------------------------
lock_sock(sk)               | smc_conn_abort
smc_conn_free               | \- smc_conn_free
\- smcr_link_put            |    \- smcr_link_put (duplicated)
release_sock(sk)

So make sure smc_conn_free() is called under the sock lock.


If I understand correctly, the fix could only solve a part of the problem, i.e. what the second call trace reported, right?

Fixes: 8cf3f3e42374 ("net/smc: use helper smc_conn_abort() in listen processing")
Co-developed-by: Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx>
Signed-off-by: Guangguan Wang <guangguan.wang@xxxxxxxxxxxxxxxxx>
Co-developed-by: Kai <KaiShen@xxxxxxxxxxxxxxxxx>
Signed-off-by: Kai <KaiShen@xxxxxxxxxxxxxxxxx>
Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
---
  net/smc/af_smc.c | 25 +++++++++++++++++++++----
  1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index ed6d4d520bc7..e0a7a0151b11 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -973,7 +973,8 @@ static int smc_connect_decline_fallback(struct smc_sock *smc, int reason_code,
  	return smc_connect_fallback(smc, reason_code);
  }
-static void smc_conn_abort(struct smc_sock *smc, int local_first)
+static void __smc_conn_abort(struct smc_sock *smc, int local_first,
+			     bool locked)
  {
  	struct smc_connection *conn = &smc->conn;
  	struct smc_link_group *lgr = conn->lgr;
@@ -982,11 +983,27 @@ static void smc_conn_abort(struct smc_sock *smc, int local_first)
  	if (smc_conn_lgr_valid(conn))
  		lgr_valid = true;
- smc_conn_free(conn);
+	if (!locked) {
+		lock_sock(&smc->sk);
+		smc_conn_free(conn);
+		release_sock(&smc->sk);
+	} else {
+		smc_conn_free(conn);
+	}
  	if (local_first && lgr_valid)
  		smc_lgr_cleanup_early(lgr);
  }
+static void smc_conn_abort(struct smc_sock *smc, int local_first)
+{
+	__smc_conn_abort(smc, local_first, false);
+}
+
+static void smc_conn_abort_locked(struct smc_sock *smc, int local_first)
+{
+	__smc_conn_abort(smc, local_first, true);
+}
+
  /* check if there is a rdma device available for this connection. */
  /* called for connect and listen */
  static int smc_find_rdma_device(struct smc_sock *smc, struct smc_init_info *ini)
@@ -1352,7 +1369,7 @@ static int smc_connect_rdma(struct smc_sock *smc,
return 0;
  connect_abort:
-	smc_conn_abort(smc, ini->first_contact_local);
+	smc_conn_abort_locked(smc, ini->first_contact_local);
  	mutex_unlock(&smc_client_lgr_pending);
  	smc->connect_nonblock = 0;
@@ -1454,7 +1471,7 @@ static int smc_connect_ism(struct smc_sock *smc, return 0;
  connect_abort:
-	smc_conn_abort(smc, ini->first_contact_local);
+	smc_conn_abort_locked(smc, ini->first_contact_local);
  	mutex_unlock(&smc_server_lgr_pending);
  	smc->connect_nonblock = 0;

Why is smc_conn_abort_locked() only necessary for the smc_connect_work, not for the smc_listen_work?

Thanks,
Wenjia




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux