Re: [PATCH v1 net 2/2] rds: tcp: Fix use-after-free of net in reqsk_timer_handler().

Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> · Fri, 23 Feb 2024 10:28:32 -0800

From: Eric Dumazet <edumazet@xxxxxxxxxx>
Date: Fri, 23 Feb 2024 19:09:27 +0100
> On Fri, Feb 23, 2024 at 6:26 PM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote:
> >
> > syzkaller reported a warning of netns tracker [0] followed by KASAN
> > splat [1] and another ref tracker warning [1].
> >
> > syzkaller could not find a repro, but in the log, the only suspicious
> > sequence was as follows:
> >
> >   18:26:22 executing program 1:
> >   r0 = socket$inet6_mptcp(0xa, 0x1, 0x106)
> >   ...
> >   connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async)
> >
> > The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT.
> >
> > So, the scenario would be:
> >
> >   1. unshare(CLONE_NEWNET) creates a per netns tcp listener in
> >       rds_tcp_listen_init().
> >   2. syz-executor connect()s to it and creates a reqsk.
> >   3. syz-executor exit()s immediately.
> >   4. netns is dismantled.  [0]
> >   5. reqsk timer is fired, and UAF happens while freeing reqsk.  [1]
> >   6. listener is freed after RCU grace period.  [2]
> >
> > Basically, reqsk assumes that the listener guarantees netns safety
> > until all reqsk timers are expired by holding the listener's refcount.
> > However, this was not the case for kernel sockets.
> >
> > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in
> > inet_twsk_purge()") fixed this issue only for per-netns ehash, but
> > the issue still exists for the global ehash.
> >
> > We can apply the same fix, but this issue is specific to RDS.
> >
> > Instead of iterating potentially large ehash and purging reqsk during
> > netns dismantle, let's hold netns refcount for the kernel TCP listener.
> >
> >
> > Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx>
> > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> > ---
> >  net/rds/tcp_listen.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> > index 05008ce5c421..4f7863932df7 100644
> > --- a/net/rds/tcp_listen.c
> > +++ b/net/rds/tcp_listen.c
> > @@ -282,6 +282,11 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6)
> >                 goto out;
> >         }
> >
> > +       __netns_tracker_free(net, &sock->sk->ns_tracker, false);
> > +       sock->sk->sk_net_refcnt = 1;
> > +       get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
> > +       sock_inuse_add(net, 1);
> > +
> 
> Why using sock_create_kern() then later 'convert' this kernel socket
> to a user one ?
> 
> Would using __sock_create() avoid this ?

I think yes, but LSM would see kern=0 in pre/post socket() hooks.

Probably we can use __sock_create() in net-next and see if someone
complains.