From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Mon, 26 Feb 2024 20:52:35 +0100 > On Mon, Feb 26, 2024 at 8:39 PM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote: > > > > From: Allison Henderson <allison.henderson@xxxxxxxxxx> > > Date: Mon, 26 Feb 2024 19:22:01 +0000 > > > On Mon, 2024-02-26 at 11:14 -0800, Kuniyuki Iwashima wrote: > > > > From: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > > > > Date: Fri, 23 Feb 2024 10:28:32 -0800 > > > > > From: Eric Dumazet <edumazet@xxxxxxxxxx> > > > > > Date: Fri, 23 Feb 2024 19:09:27 +0100 > > > > > > On Fri, Feb 23, 2024 at 6:26 PM Kuniyuki Iwashima > > > > > > <kuniyu@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > syzkaller reported a warning of netns tracker [0] followed by > > > > > > > KASAN > > > > > > > splat [1] and another ref tracker warning [1]. > > > > > > > > > > > > > > syzkaller could not find a repro, but in the log, the only > > > > > > > suspicious > > > > > > > sequence was as follows: > > > > > > > > > > > > > > 18:26:22 executing program 1: > > > > > > > r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) > > > > > > > ... > > > > > > > connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, > > > > > > > @loopback}, 0x1c) (async) > > > > > > > > > > > > > > The notable thing here is 0x4001 in connect(), which is > > > > > > > RDS_TCP_PORT. > > > > > > > > > > > > > > So, the scenario would be: > > > > > > > > > > > > > > 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in > > > > > > > rds_tcp_listen_init(). > > > > > > > 2. syz-executor connect()s to it and creates a reqsk. > > > > > > > 3. syz-executor exit()s immediately. > > > > > > > 4. netns is dismantled. [0] > > > > > > > 5. reqsk timer is fired, and UAF happens while freeing > > > > > > > reqsk. [1] > > > > > > > 6. listener is freed after RCU grace period. [2] > > > > > > > > > > > > > > Basically, reqsk assumes that the listener guarantees netns > > > > > > > safety > > > > > > > until all reqsk timers are expired by holding the listener's > > > > > > > refcount. > > > > > > > However, this was not the case for kernel sockets. > > > > > > > > > > > > > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in > > > > > > > inet_twsk_purge()") fixed this issue only for per-netns ehash, > > > > > > > but > > > > > > > the issue still exists for the global ehash. > > > > > > > > > > > > > > We can apply the same fix, but this issue is specific to RDS. > > > > > > > > > > > > > > Instead of iterating potentially large ehash and purging reqsk > > > > > > > during > > > > > > > netns dismantle, let's hold netns refcount for the kernel TCP > > > > > > > listener. > > > > > > > > > > > > > > > > > > > > > Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx> > > > > > > > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen > > > > > > > endpoints, one per netns.") > > > > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > > > > > > > --- > > > > > > > net/rds/tcp_listen.c | 5 +++++ > > > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > > > > > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c > > > > > > > index 05008ce5c421..4f7863932df7 100644 > > > > > > > --- a/net/rds/tcp_listen.c > > > > > > > +++ b/net/rds/tcp_listen.c > > > > > > > @@ -282,6 +282,11 @@ struct socket *rds_tcp_listen_init(struct > > > > > > > net *net, bool isv6) > > > > > > > goto out; > > > > > > > } > > > > > > > > > > > > > > + __netns_tracker_free(net, &sock->sk->ns_tracker, > > > > > > > false); > > > > > > > + sock->sk->sk_net_refcnt = 1; > > > > > > > + get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL); > > > > > > > + sock_inuse_add(net, 1); > > > > > > > + > > > > > > > > > > > > Why using sock_create_kern() then later 'convert' this kernel > > > > > > socket > > > > > > to a user one ? > > > > > > > > > > > > Would using __sock_create() avoid this ? > > > > > > > > > > I think yes, but LSM would see kern=0 in pre/post socket() hooks. > > > > > > > > > > Probably we can use __sock_create() in net-next and see if someone > > > > > complains. > > > > > > > > I noticed the patchwork status is Changes Requested. > > > > https://urldefense.com/v3/__https://patchwork.kernel.org/project/netdevbpf/list/?series=829213&state=*__;Kg!!ACWV5N9M2RV99hQ!KHKUQKUDnNCdiEcb4ZK1VBiYSitarEb-CAWeSJvaeK04fgW4cuWePg3Ac2HmIAPUHuqeCwgt466fHEKAAdfa$ > > > > > > > > > > > > Should we use __sock_create() for RDS or add another parameter > > > > to __sock_create(..., kern=true/false, netref=true/false) and > > > > fix other similar uses (MPTCP, SMC, Netlink) altogether ? > > > > > > > > Thanks! > > > > > > Hi all, > > > > > > Thank you for looking at this. I've been doing a little investigation > > > in the area to better understand the issue and this fix. While I > > > understand what this patch is trying to do here, I'd like to do a > > > little more digging as to why 740ea3c4a0b2 didnt work for rds, > > > > 740ea3c4a0b2 works only for netns with its dedicated ehash, which > > is unshare(CLONE_NEWNET)d with net.ipv4.tcp_child_ehash_entries != 0. > > > > With the diff below, we can fix the issue, but as noted in the > > description, this slows down netns dismantle where no reqsk, this > > is true if the netns did not have kernel TCP sockets. > > BTW I note that inet_twsk_purge() probably would need this "goto > restart;" as well.... > > sk_nulls_for_each_rcu(sk, node, &head->chain) is not a _safe variant... Ah exactly! Will you post it or I can include it as the first patch. The following patch will remove it and the end result will be the same though. > > diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c > index 5befa4de5b2416281ad2795713a70d0fd847b0b2..a62031d85bcce00261193136dd72d9f57ffd34fc > 100644 > --- a/net/ipv4/inet_timewait_sock.c > +++ b/net/ipv4/inet_timewait_sock.c > @@ -283,10 +283,11 @@ void inet_twsk_purge(struct inet_hashinfo > *hashinfo, int family) > * freed. Userspace listener and > reqsk never exist here. > */ > if (unlikely(sk->sk_state == TCP_NEW_SYN_RECV && > - hashinfo->pernet)) { > + > !refcount_read(&sock_net(sk)->ns.count))) { > struct request_sock *req = > inet_reqsk(sk); > > > inet_csk_reqsk_queue_drop_and_put(req->rsk_listener, req); > + goto restart; > } > > continue;