From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Thu, 7 Mar 2024 10:51:15 +0100 > On Thu, Mar 7, 2024 at 12:05 AM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote: > > > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in > > inet_twsk_purge()") added changes in inet_twsk_purge() to purge > > reqsk in per-netns ehash during netns dismantle. > > > > inet_csk_reqsk_queue_drop_and_put() will remove reqsk from per-netns > > ehash, but the iteration uses sk_nulls_for_each_rcu(), which is not > > safe. > > > > After removing reqsk, we need to restart iteration. > > > > Note that we need not check net->ns.count here because per-netns > > ehash does not have reqsk in other live netns. We will check > > net->ns.count in the following patch. > > > > Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") > > Reported-by: Eric Dumazet <edumazet@xxxxxxxxxx> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > > --- > > net/ipv4/inet_timewait_sock.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > > index 5befa4de5b24..00cbebaa2c68 100644 > > --- a/net/ipv4/inet_timewait_sock.c > > +++ b/net/ipv4/inet_timewait_sock.c > > @@ -287,6 +287,8 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) > > struct request_sock *req = inet_reqsk(sk); > > > > inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); > > + > > + goto restart; > > } > > > > continue; > > Note how the RCU rules that I followed for TCP_TIME_WAIT made > me to grab a reference on tw->tw_refcnt, using refcount_inc_not_zero() > > I think your code had multiple bugs, because > inet_csk_reqsk_queue_drop_and_put() could cause UAF > if the timer already fired and refcount went to zero already. Ugh.. exactly. I'll post v4 following the TIME_WAIT path. ---8<--- diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index 961b1917c3eb..c81f83893fc7 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -278,20 +278,32 @@ void inet_twsk_purge(struct inet_hashinfo *hashinfo, int family) restart: sk_nulls_for_each_rcu(sk, node, &head->chain) { if (sk->sk_state != TCP_TIME_WAIT) { + struct request_sock *req; + + if (likely(sk->sk_state != TCP_NEW_SYN_RECV)) + continue; + /* A kernel listener socket might not hold refcnt for net, * so reqsk_timer_handler() could be fired after net is * freed. Userspace listener and reqsk never exist here. */ - if (unlikely(sk->sk_state == TCP_NEW_SYN_RECV && - !refcount_read(&sock_net(sk)->ns.count))) { - struct request_sock *req = inet_reqsk(sk); - inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); + if (sk->sk_family != family || + refcount_read(&sock_net(sk)->ns.count)) + continue; + + req = inet_reqsk(sk); + if (unlikely(!refcount_inc_not_zero(&req->rsk_refcnt))) + continue; - goto restart; + if (unlikely(sk->sk_family != family || + refcount_read(&sock_net(sk)->ns.count))) { + reqsk_put(req); + continue; } - continue; + inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); + goto restart; } tw = inet_twsk(sk); ---8<--- > > We also could add sk_nulls_for_each_rcu_safe() to avoid these pesky > "goto restart;" I'll post this followup for net-next in the next release cycle. Thanks!