Re: [PATCH v2 net 4/5] rds: tcp: Fix use-after-free of net in reqsk_timer_handler().

Eric Dumazet <edumazet@xxxxxxxxxx> · Tue, 27 Feb 2024 13:06:07 +0100

On Tue, Feb 27, 2024 at 2:12 AM Kuniyuki Iwashima <kuniyu@xxxxxxxxxx> wrote:
>
> syzkaller reported a warning of netns tracker [0] followed by KASAN
> splat [1] and another ref tracker warning [1].
>
> syzkaller could not find a repro, but in the log, the only suspicious
> sequence was as follows:
>
>   18:26:22 executing program 1:
>   r0 = socket$inet6_mptcp(0xa, 0x1, 0x106)
>   ...
>   connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async)
>
> The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT.
>
> So, the scenario would be:
>
>   1. unshare(CLONE_NEWNET) creates a per netns tcp listener in
>       rds_tcp_listen_init().
>   2. syz-executor connect()s to it and creates a reqsk.
>   3. syz-executor exit()s immediately.
>   4. netns is dismantled.  [0]
>   5. reqsk timer is fired, and UAF happens while freeing reqsk.  [1]
>   6. listener is freed after RCU grace period.  [2]
>
> Basically, reqsk assumes that the listener guarantees netns safety
> until all reqsk timers are expired by holding the listener's refcount.
> However, this was not the case for kernel sockets.
>
> Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in
> inet_twsk_purge()") fixed this issue only for per-netns ehash, but
> the issue still exists for the global ehash.
>
> We can apply the same fix, but this issue is specific to RDS.
>
> Instead of iterating ehash and purging reqsk during netns dismantle,
> let's hold netns refcount for the kernel listener.
>
>

> Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx>
> Suggested-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> ---
>  net/rds/tcp_listen.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> index 05008ce5c421..2d40e523322c 100644
> --- a/net/rds/tcp_listen.c
> +++ b/net/rds/tcp_listen.c
> @@ -274,8 +274,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6)
>         int addr_len;
>         int ret;
>
> -       ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
> -                              IPPROTO_TCP, &sock);
> +       ret = __sock_create(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM,
> +                           IPPROTO_TCP, &sock, SOCKET_KERN_NET_REF);
>         if (ret < 0) {
>                 rdsdebug("could not create %s listener socket: %d\n",
>                          isv6 ? "IPv6" : "IPv4", ret);

If RDS module keeps a listener alive, not attached to a user process,
netns dismantle will never occur.

I think we have to cleanup SYN_RECV sockets in inet_twsk_purge()

Yes, it removes one optimization you did.

Perhaps add a counter of all kernel sockets that were ever attached to
a netns in order to decide to apply the optimization.
(keeping a precise count of SYN_RECV would be too expensive)