On Tue, Jul 4, 2023 at 1:53 AM Ilya Maximets <i.maximets@xxxxxxx> wrote: > > Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. > A privileged process might create the socket and pass it to a > non-privileged process for later use. However, that process will be > able to bind the socket to any network interface. Even though it will > not be able to receive any traffic without modification of the BPF map, > the situation is not ideal. > > Sockets already have a mechanism that can be used to restrict what > interface they can be attached to. That is SO_BINDTODEVICE. > > To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. > > Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer > workflow when non-privileged process is using AF_XDP. > > The intended workflow is following: > > 1. First process creates a bare socket with socket(AF_XDP, ...). > 2. First process loads the XSK program to the interface. > 3. First process adds the socket fd to a BPF map. > 4. First process ties socket fd to a particular interface using > SO_BINDTODEVICE. > 5. First process sends socket fd to a second process. > 6. Second process allocates UMEM. > 7. Second process binds socket to the interface with bind(...). > 8. Second process sends/receives the traffic. > > All the steps above are possible today if the first process is > privileged and the second one has sufficient RLIMIT_MEMLOCK and no > capabilities. However, the second process will be able to bind the > socket to any interface it wants on step 7 and send traffic from it. > With the proposed change, the second process will be able to bind > the socket only to a specific interface chosen by the first process > at step 4. > > Acked-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > Signed-off-by: Ilya Maximets <i.maximets@xxxxxxx> Acked-by: Jason Wang <jasowang@xxxxxxxxxx> Is this a stable material or not? Thanks > --- > > RFC --> PATCH: > * Better explained intended workflow in a commit message. > * Added ACK from Magnus. > > Documentation/networking/af_xdp.rst | 9 +++++++++ > net/xdp/xsk.c | 6 ++++++ > 2 files changed, 15 insertions(+) > > diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst > index 247c6c4127e9..1cc35de336a4 100644 > --- a/Documentation/networking/af_xdp.rst > +++ b/Documentation/networking/af_xdp.rst > @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the > application to use. The final option is the flags field, but it will > be dealt with in separate sections for each UMEM flag. > > +SO_BINDTODEVICE setsockopt > +-------------------------- > + > +This is a generic SOL_SOCKET option that can be used to tie AF_XDP > +socket to a particular network interface. It is useful when a socket > +is created by a privileged process and passed to a non-privileged one. > +Once the option is set, kernel will refuse attempts to bind that socket > +to a different interface. Updating the value requires CAP_NET_RAW. > + > XDP_STATISTICS getsockopt > ------------------------- > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 5a8c0dd250af..386ff641db0f 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) > struct sock *sk = sock->sk; > struct xdp_sock *xs = xdp_sk(sk); > struct net_device *dev; > + int bound_dev_if; > u32 flags, qid; > int err = 0; > > @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) > XDP_USE_NEED_WAKEUP)) > return -EINVAL; > > + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); > + > + if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex) > + return -EINVAL; > + > rtnl_lock(); > mutex_lock(&xs->mutex); > if (xs->state != XSK_READY) { > -- > 2.40.1 >