On Mon, Dec 5, 2022 at 9:58 PM Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > > On Fri, 2022-12-02 at 15:16 -0500, Paul Moore wrote: [...] > > What if we added a new LSM call in mptcp_subflow_create_socket(), just > > after the sock_create_kern() call? > > That should work, I think. I would like to propose a (last) attempt > that will not need an additional selinux hook - to try to minimize the > required changes and avoid unnecessary addional work for current and > future LSM mainteniance and creation. > > I tested the following patch and passes the reproducer (and mptcp self- > tests). Basically it introduces and uses a sock_create_nosec variant, > to allow mptcp_subflow_create_socket() calling > security_socket_post_create() with the corrct arguments. WDYT? This seems like a step in the right direction, but I wonder if we shouldn't solve the current overloading of the "kern" flag more explicitly - i.e. split it into two flags: one to indicate that the socket will only be used internally by the kernel ("internal") and another one to indicate if it should be labeled according to the current task or as a kernel-created socket ("kern"?). Technically, each combination could have a valid use case: - !internal && !kern -> a regular userspace-created socket, - !internal && kern -> a socket that is exposed to userspace, but created by the kernel outside of a syscall (e.g. some global socket created during initcall phase and later returned to userspace via an ioctl or something), - internal && !kern -> our MPTCP case, where the socket itself is internal, but the label is still important so it can be passed onto its accept-offspring (which may no longer be internal), - internal && kern -> a completely kernel-internal socket. Another concern I have about this approach is whether it is possible (in some more advanced scenario) for mptcp_subflow_create_socket() to be called in the context of a different task than the one creating/handling the main socket. Because then a potential socket accepted from the new subflow socket would end up with an unexpected (and probably semantically wrong) label. Glancing over the call tree, it seems it can be called via some netlink commands - presumably intended to be used by mptcpd? > > --- > include/linux/net.h | 2 ++ > net/mptcp/subflow.c | 18 ++++++++++++-- > net/socket.c | 60 ++++++++++++++++++++++++++++++--------------- > 3 files changed, 58 insertions(+), 22 deletions(-) > > diff --git a/include/linux/net.h b/include/linux/net.h > index b73ad8e3c212..91713012504d 100644 > --- a/include/linux/net.h > +++ b/include/linux/net.h > @@ -251,6 +251,8 @@ int sock_wake_async(struct socket_wq *sk_wq, int how, int band); > int sock_register(const struct net_proto_family *fam); > void sock_unregister(int family); > bool sock_is_registered(int family); > +int __sock_create_nosec(struct net *net, int family, int type, int proto, > + struct socket **res, int kern); > int __sock_create(struct net *net, int family, int type, int proto, > struct socket **res, int kern); > int sock_create(int family, int type, int proto, struct socket **res); > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c > index 29904303f5c2..9341f9313154 100644 > --- a/net/mptcp/subflow.c > +++ b/net/mptcp/subflow.c > @@ -1646,11 +1646,25 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) > if (unlikely(!sk->sk_socket)) > return -EINVAL; > > - err = sock_create_kern(net, sk->sk_family, SOCK_STREAM, IPPROTO_TCP, > - &sf); > + /* the subflow is created by the kernel, and we need kernel annotation > + * for lockdep's sake... > + */ > + err = __sock_create_nosec(net, sk->sk_family, SOCK_STREAM, IPPROTO_TCP, > + &sf, 1); > if (err) > return err; > > + /* ... but the first subflow will be indirectly exposed to the > + * user-space via accept(). Let's attach the current user security > + * label > + */ > + err = security_socket_post_create(sf, sk->sk_family, SOCK_STREAM, > + IPPROTO_TCP, 0); > + if (err) { > + sock_release(sf); > + return err; > + } > + > lock_sock(sf->sk); > > /* the newly created socket has to be in the same cgroup as its parent */ > diff --git a/net/socket.c b/net/socket.c > index 55c5d536e5f6..d5d51e4e26ae 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -1426,23 +1426,11 @@ int sock_wake_async(struct socket_wq *wq, int how, int band) > } > EXPORT_SYMBOL(sock_wake_async); > > -/** > - * __sock_create - creates a socket > - * @net: net namespace > - * @family: protocol family (AF_INET, ...) > - * @type: communication type (SOCK_STREAM, ...) > - * @protocol: protocol (0, ...) > - * @res: new socket > - * @kern: boolean for kernel space sockets > - * > - * Creates a new socket and assigns it to @res, passing through LSM. > - * Returns 0 or an error. On failure @res is set to %NULL. @kern must > - * be set to true if the socket resides in kernel space. > - * This function internally uses GFP_KERNEL. > - */ > > -int __sock_create(struct net *net, int family, int type, int protocol, > - struct socket **res, int kern) > + > +/* Creates a socket leaving LSM post-creation checks to the caller */ > +int __sock_create_nosec(struct net *net, int family, int type, int protocol, > + struct socket **res, int kern) > { > int err; > struct socket *sock; > @@ -1528,11 +1516,8 @@ int __sock_create(struct net *net, int family, int type, int protocol, > * module can have its refcnt decremented > */ > module_put(pf->owner); > - err = security_socket_post_create(sock, family, type, protocol, kern); > - if (err) > - goto out_sock_release; > - *res = sock; > > + *res = sock; > return 0; > > out_module_busy: > @@ -1548,6 +1533,41 @@ int __sock_create(struct net *net, int family, int type, int protocol, > rcu_read_unlock(); > goto out_sock_release; > } > + > +/** > + * __sock_create - creates a socket > + * @net: net namespace > + * @family: protocol family (AF_INET, ...) > + * @type: communication type (SOCK_STREAM, ...) > + * @protocol: protocol (0, ...) > + * @res: new socket > + * @kern: boolean for kernel space sockets > + * > + * Creates a new socket and assigns it to @res, passing through LSM. > + * Returns 0 or an error. On failure @res is set to %NULL. @kern must > + * be set to true if the socket resides in kernel space. > + * This function internally uses GFP_KERNEL. > + */ > + > +int __sock_create(struct net *net, int family, int type, int protocol, > + struct socket **res, int kern) > +{ > + struct socket *sock; > + int err; > + > + err = __sock_create_nosec(net, family, type, protocol, &sock, kern); > + if (err) > + return err; > + > + err = security_socket_post_create(sock, family, type, protocol, kern); > + if (err) { > + sock_release(sock); > + return err; > + } > + > + *res = sock; > + return 0; > +} > EXPORT_SYMBOL(__sock_create); > > /** > -- Ondrej Mosnacek Senior Software Engineer, Linux Security - SELinux kernel Red Hat, Inc.