From: Eric Dumazet <edumazet@xxxxxxxxxx> Date: Tue, 27 Feb 2024 13:06:07 +0100 > On Tue, Feb 27, 2024 at 2:12 AM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote: > > > > syzkaller reported a warning of netns tracker [0] followed by KASAN > > splat [1] and another ref tracker warning [1]. > > > > syzkaller could not find a repro, but in the log, the only suspicious > > sequence was as follows: > > > > 18:26:22 executing program 1: > > r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) > > ... > > connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) > > > > The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. > > > > So, the scenario would be: > > > > 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in > > rds_tcp_listen_init(). > > 2. syz-executor connect()s to it and creates a reqsk. > > 3. syz-executor exit()s immediately. > > 4. netns is dismantled. [0] > > 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] > > 6. listener is freed after RCU grace period. [2] > > > > Basically, reqsk assumes that the listener guarantees netns safety > > until all reqsk timers are expired by holding the listener's refcount. > > However, this was not the case for kernel sockets. > > > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in > > inet_twsk_purge()") fixed this issue only for per-netns ehash, but > > the issue still exists for the global ehash. > > > > We can apply the same fix, but this issue is specific to RDS. > > > > Instead of iterating ehash and purging reqsk during netns dismantle, > > let's hold netns refcount for the kernel listener. > > > > > > > Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx> > > Suggested-by: Eric Dumazet <edumazet@xxxxxxxxxx> > > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") > > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> > > --- > > net/rds/tcp_listen.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c > > index 05008ce5c421..2d40e523322c 100644 > > --- a/net/rds/tcp_listen.c > > +++ b/net/rds/tcp_listen.c > > @@ -274,8 +274,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) > > int addr_len; > > int ret; > > > > - ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, > > - IPPROTO_TCP, &sock); > > + ret = __sock_create(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, > > + IPPROTO_TCP, &sock, SOCKET_KERN_NET_REF); > > if (ret < 0) { > > rdsdebug("could not create %s listener socket: %d\n", > > isv6 ? "IPv6" : "IPv4", ret); > > If RDS module keeps a listener alive, not attached to a user process, > netns dismantle will never occur. > > I think we have to cleanup SYN_RECV sockets in inet_twsk_purge() Ah.. yes, __init_net ops hook must not take net ref.. I'll go that way in v3. > > Yes, it removes one optimization you did. > > Perhaps add a counter of all kernel sockets that were ever attached to > a netns in order to decide to apply the optimization. > (keeping a precise count of SYN_RECV would be too expensive) I'll work on the follow-up for net-next after the right fix is merged. Thanks!