On Fri, Feb 23, 2024 at 6:26 PM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote: > > syzkaller reported a warning of netns tracker [0] followed by KASAN > splat [1] and another ref tracker warning [1]. > > syzkaller could not find a repro, but in the log, the only suspicious > sequence was as follows: > > 18:26:22 executing program 1: > r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) > ... > connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) > > The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. > > So, the scenario would be: > > 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in > rds_tcp_listen_init(). > 2. syz-executor connect()s to it and creates a reqsk. > 3. syz-executor exit()s immediately. > 4. netns is dismantled. [0] > 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] > 6. listener is freed after RCU grace period. [2] > > Basically, reqsk assumes that the listener guarantees netns safety > until all reqsk timers are expired by holding the listener's refcount. > However, this was not the case for kernel sockets. > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in > inet_twsk_purge()") fixed this issue only for per-netns ehash, but > the issue still exists for the global ehash. > > We can apply the same fix, but this issue is specific to RDS. > > Instead of iterating potentially large ehash and purging reqsk during > netns dismantle, let's hold netns refcount for the kernel TCP listener. > > > Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx> > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > --- > net/rds/tcp_listen.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c > index 05008ce5c421..4f7863932df7 100644 > --- a/net/rds/tcp_listen.c > +++ b/net/rds/tcp_listen.c > @@ -282,6 +282,11 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) > goto out; > } > > + __netns_tracker_free(net, &sock->sk->ns_tracker, false); > + sock->sk->sk_net_refcnt = 1; > + get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL); > + sock_inuse_add(net, 1); > + Why using sock_create_kern() then later 'convert' this kernel socket to a user one ? Would using __sock_create() avoid this ?