Re: [RFC] net: add new socket option SO_SETNETNS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 02, 2023 at 09:10:23PM +0100, Eric Dumazet wrote:
> On Thu, Feb 2, 2023 at 8:55 PM Alok Tiagi <aloktiagi@xxxxxxxxx> wrote:
> >
> > On Thu, Feb 02, 2023 at 09:48:10AM +0800, Hillf Danton wrote:
> > > On Wed, 1 Feb 2023 19:22:57 +0000 aloktiagi <aloktiagi@xxxxxxxxx>
> > > > @@ -1535,6 +1535,52 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > > >             WRITE_ONCE(sk->sk_txrehash, (u8)val);
> > > >             break;
> > > >
> > > > +   case SO_SETNETNS:
> > > > +   {
> > > > +           struct net *other_ns, *my_ns;
> > > > +
> > > > +           if (sk->sk_family != AF_INET && sk->sk_family != AF_INET6) {
> > > > +                   ret = -EOPNOTSUPP;
> > > > +                   break;
> > > > +           }
> > > > +
> > > > +           if (sk->sk_type != SOCK_STREAM && sk->sk_type != SOCK_DGRAM) {
> > > > +                   ret = -EOPNOTSUPP;
> > > > +                   break;
> > > > +           }
> > > > +
> > > > +           other_ns = get_net_ns_by_fd(val);
> > > > +           if (IS_ERR(other_ns)) {
> > > > +                   ret = PTR_ERR(other_ns);
> > > > +                   break;
> > > > +           }
> > > > +
> > > > +           if (!ns_capable(other_ns->user_ns, CAP_NET_ADMIN)) {
> > > > +                   ret = -EPERM;
> > > > +                   goto out_err;
> > > > +           }
> > > > +
> > > > +           /* check that the socket has never been connected or recently disconnected */
> > > > +           if (sk->sk_state != TCP_CLOSE || sk->sk_shutdown & SHUTDOWN_MASK) {
> > > > +                   ret = -EOPNOTSUPP;
> > > > +                   goto out_err;
> > > > +           }
> > > > +
> > > > +           /* check that the socket is not bound to an interface*/
> > > > +           if (sk->sk_bound_dev_if != 0) {
> > > > +                   ret = -EOPNOTSUPP;
> > > > +                   goto out_err;
> > > > +           }
> > > > +
> > > > +           my_ns = sock_net(sk);
> > > > +           sock_net_set(sk, other_ns);
> > > > +           put_net(my_ns);
> > > > +           break;
> > >
> > >               cpu 0                           cpu 2
> > >               ---                             ---
> > >                                               ns = sock_net(sk);
> > >               my_ns = sock_net(sk);
> > >               sock_net_set(sk, other_ns);
> > >               put_net(my_ns);
> > >                                               ns is invalid ?
> >
> > That is the reason we want the socket to be in an un-connected state. That
> > should help us avoid this situation.
> 
> This is not enough....
> 
> Another thread might look at sock_net(sk), for example from inet_diag
> or tcp timers
> (which can be fired even in un-connected state)
> 
> Even UDP sockets can receive packets while being un-connected,
> and they need to deref the net pointer.
> 
> Currently there is no protection about sock_net(sk) being changed on the fly,
> and the struct net could disappear and be freed.
> 
> There are ~1500 uses of sock_net(sk) in the kernel, I do not think
> you/we want to audit all
> of them to check what could go wrong...

I agree, auditing all the uses of sock_net(sk) is not a feasible option. From my
exploration of the usage of sock_net(sk) it appeared that it might be safe to
swap a sockets net ns if it had never been connected but I looked at only a
subset of such uses.

Introducing a ref counting logic to every access of sock_net(sk) may help get
around this but invovles a bigger change to increment and decrement the count at
every use of sock_net().

Any suggestions if this could be achieved in another way much close to the
socket creation time or any comments on our workaround for injecting sockets using
seccomp addfd?




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux