Re: [PATCH v1 net 2/2] rds: tcp: Fix use-after-free of net in reqsk_timer_handler().

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Allison Henderson <allison.henderson@xxxxxxxxxx>
Date: Mon, 26 Feb 2024 19:22:01 +0000
> On Mon, 2024-02-26 at 11:14 -0800, Kuniyuki Iwashima wrote:
> > From: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> > Date: Fri, 23 Feb 2024 10:28:32 -0800
> > > From: Eric Dumazet <edumazet@xxxxxxxxxx>
> > > Date: Fri, 23 Feb 2024 19:09:27 +0100
> > > > On Fri, Feb 23, 2024 at 6:26 PM Kuniyuki Iwashima
> > > > <kuniyu@xxxxxxxxxx> wrote:
> > > > > 
> > > > > syzkaller reported a warning of netns tracker [0] followed by
> > > > > KASAN
> > > > > splat [1] and another ref tracker warning [1].
> > > > > 
> > > > > syzkaller could not find a repro, but in the log, the only
> > > > > suspicious
> > > > > sequence was as follows:
> > > > > 
> > > > >   18:26:22 executing program 1:
> > > > >   r0 = socket$inet6_mptcp(0xa, 0x1, 0x106)
> > > > >   ...
> > > > >   connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0,
> > > > > @loopback}, 0x1c) (async)
> > > > > 
> > > > > The notable thing here is 0x4001 in connect(), which is
> > > > > RDS_TCP_PORT.
> > > > > 
> > > > > So, the scenario would be:
> > > > > 
> > > > >   1. unshare(CLONE_NEWNET) creates a per netns tcp listener in
> > > > >       rds_tcp_listen_init().
> > > > >   2. syz-executor connect()s to it and creates a reqsk.
> > > > >   3. syz-executor exit()s immediately.
> > > > >   4. netns is dismantled.  [0]
> > > > >   5. reqsk timer is fired, and UAF happens while freeing
> > > > > reqsk.  [1]
> > > > >   6. listener is freed after RCU grace period.  [2]
> > > > > 
> > > > > Basically, reqsk assumes that the listener guarantees netns
> > > > > safety
> > > > > until all reqsk timers are expired by holding the listener's
> > > > > refcount.
> > > > > However, this was not the case for kernel sockets.
> > > > > 
> > > > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in
> > > > > inet_twsk_purge()") fixed this issue only for per-netns ehash,
> > > > > but
> > > > > the issue still exists for the global ehash.
> > > > > 
> > > > > We can apply the same fix, but this issue is specific to RDS.
> > > > > 
> > > > > Instead of iterating potentially large ehash and purging reqsk
> > > > > during
> > > > > netns dismantle, let's hold netns refcount for the kernel TCP
> > > > > listener.
> > > > > 
> > > > > 
> > > > > Reported-by: syzkaller <syzkaller@xxxxxxxxxxxxxxxx>
> > > > > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen
> > > > > endpoints, one per netns.")
> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@xxxxxxxxxx>
> > > > > ---
> > > > >  net/rds/tcp_listen.c | 5 +++++
> > > > >  1 file changed, 5 insertions(+)
> > > > > 
> > > > > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
> > > > > index 05008ce5c421..4f7863932df7 100644
> > > > > --- a/net/rds/tcp_listen.c
> > > > > +++ b/net/rds/tcp_listen.c
> > > > > @@ -282,6 +282,11 @@ struct socket *rds_tcp_listen_init(struct
> > > > > net *net, bool isv6)
> > > > >                 goto out;
> > > > >         }
> > > > > 
> > > > > +       __netns_tracker_free(net, &sock->sk->ns_tracker,
> > > > > false);
> > > > > +       sock->sk->sk_net_refcnt = 1;
> > > > > +       get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
> > > > > +       sock_inuse_add(net, 1);
> > > > > +
> > > > 
> > > > Why using sock_create_kern() then later 'convert' this kernel
> > > > socket
> > > > to a user one ?
> > > > 
> > > > Would using __sock_create() avoid this ?
> > > 
> > > I think yes, but LSM would see kern=0 in pre/post socket() hooks.
> > > 
> > > Probably we can use __sock_create() in net-next and see if someone
> > > complains.
> > 
> > I noticed the patchwork status is Changes Requested.
> > https://urldefense.com/v3/__https://patchwork.kernel.org/project/netdevbpf/list/?series=829213&state=*__;Kg!!ACWV5N9M2RV99hQ!KHKUQKUDnNCdiEcb4ZK1VBiYSitarEb-CAWeSJvaeK04fgW4cuWePg3Ac2HmIAPUHuqeCwgt466fHEKAAdfa$
> >  
> > 
> > Should we use __sock_create() for RDS or add another parameter
> > to __sock_create(..., kern=true/false, netref=true/false) and
> > fix other similar uses (MPTCP, SMC, Netlink) altogether ?
> > 
> > Thanks!
> 
> Hi all,
> 
> Thank you for looking at this.  I've been doing a little investigation
> in the area to better understand the issue and this fix.  While I
> understand what this patch is trying to do here, I'd like to do a
> little more digging as to why 740ea3c4a0b2 didnt work for rds,

740ea3c4a0b2 works only for netns with its dedicated ehash, which
is unshare(CLONE_NEWNET)d with net.ipv4.tcp_child_ehash_entries != 0.

With the diff below, we can fix the issue, but as noted in the
description, this slows down netns dismantle where no reqsk, this
is true if the netns did not have kernel TCP sockets.

---8<---
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 9e85f2a0bddd..0ecc7311dc6c 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -398,10 +398,6 @@ void tcp_twsk_purge(struct list_head *net_exit_list, int family)
 			/* Even if tw_refcount == 1, we must clean up kernel reqsk */
 			inet_twsk_purge(net->ipv4.tcp_death_row.hashinfo, family);
 		} else if (!purged_once) {
-			/* The last refcount is decremented in tcp_sk_exit_batch() */
-			if (refcount_read(&net->ipv4.tcp_death_row.tw_refcount) == 1)
-				continue;
-
 			inet_twsk_purge(&tcp_hashinfo, family);
 			purged_once = true;
 		}
---8<---


> or what
> else rds may not be doing correctly that the other sockets are.

RDS has to clean up all sockets before netns are destroyed, but
it doesn't.  RDS per-netns TCP listener could have reqsk tied to
it, and reqsk timer could be fired after the netns and the listener
are freed.

Here we have 2 options to fix the issue.

  1) iterate ehash and purge reqsk during netns dismantle
  2) defer netns dismantle until reqsk timers are all fired

and 2) is preferred as the issue is specific to RDS.


> I'm
> not quite sure about setting the kern parameter to 0 for socket_create.
> While it seems like it would have a similar effect, this looks
> incorrect since this is not a user space socket.  

Probably kern parameter could be enum.

  SOCKET_USER = 0,
  SOCKET_KERN,
  SOCKET_KERN_NET_REFCNT,

If the enum is > 0, we can invoke LSM and mask it with
SOCKET_KERN_NET_REFCNT and pass down to it.




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux