On Thu, Feb 02, 2023 at 09:10:23PM +0100, Eric Dumazet wrote: > On Thu, Feb 2, 2023 at 8:55 PM Alok Tiagi <aloktiagi@xxxxxxxxx> wrote: > > > > On Thu, Feb 02, 2023 at 09:48:10AM +0800, Hillf Danton wrote: > > > On Wed, 1 Feb 2023 19:22:57 +0000 aloktiagi <aloktiagi@xxxxxxxxx> > > > > @@ -1535,6 +1535,52 @@ int sk_setsockopt(struct sock *sk, int level, int optname, > > > > WRITE_ONCE(sk->sk_txrehash, (u8)val); > > > > break; > > > > > > > > + case SO_SETNETNS: > > > > + { > > > > + struct net *other_ns, *my_ns; > > > > + > > > > + if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) { > > > > + ret = -EOPNOTSUPP; > > > > + break; > > > > + } > > > > + > > > > + if (sk->sk_type != SOCK_STREAM && sk->sk_type != SOCK_DGRAM) { > > > > + ret = -EOPNOTSUPP; > > > > + break; > > > > + } > > > > + > > > > + other_ns = get_net_ns_by_fd(val); > > > > + if (IS_ERR(other_ns)) { > > > > + ret = PTR_ERR(other_ns); > > > > + break; > > > > + } > > > > + > > > > + if (!ns_capable(other_ns->user_ns, CAP_NET_ADMIN)) { > > > > + ret = -EPERM; > > > > + goto out_err; > > > > + } > > > > + > > > > + /* check that the socket has never been connected or recently disconnected */ > > > > + if (sk->sk_state != TCP_CLOSE || sk->sk_shutdown & SHUTDOWN_MASK) { > > > > + ret = -EOPNOTSUPP; > > > > + goto out_err; > > > > + } > > > > + > > > > + /* check that the socket is not bound to an interface*/ > > > > + if (sk->sk_bound_dev_if != 0) { > > > > + ret = -EOPNOTSUPP; > > > > + goto out_err; > > > > + } > > > > + > > > > + my_ns = sock_net(sk); > > > > + sock_net_set(sk, other_ns); > > > > + put_net(my_ns); > > > > + break; > > > > > > cpu 0 cpu 2 > > > --- --- > > > ns = sock_net(sk); > > > my_ns = sock_net(sk); > > > sock_net_set(sk, other_ns); > > > put_net(my_ns); > > > ns is invalid ? > > > > That is the reason we want the socket to be in an un-connected state. That > > should help us avoid this situation. > > This is not enough.... > > Another thread might look at sock_net(sk), for example from inet_diag > or tcp timers > (which can be fired even in un-connected state) > > Even UDP sockets can receive packets while being un-connected, > and they need to deref the net pointer. > > Currently there is no protection about sock_net(sk) being changed on the fly, > and the struct net could disappear and be freed. > > There are ~1500 uses of sock_net(sk) in the kernel, I do not think > you/we want to audit all > of them to check what could go wrong... I agree, auditing all the uses of sock_net(sk) is not a feasible option. From my exploration of the usage of sock_net(sk) it appeared that it might be safe to swap a sockets net ns if it had never been connected but I looked at only a subset of such uses. Introducing a ref counting logic to every access of sock_net(sk) may help get around this but invovles a bigger change to increment and decrement the count at every use of sock_net(). Any suggestions if this could be achieved in another way much close to the socket creation time or any comments on our workaround for injecting sockets using seccomp addfd?