Re: [RFC bpf-next] xsk: honor SO_BINDTODEVICE on bind

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/3/23 12:06, Ilya Maximets wrote:
> On 7/3/23 11:48, Magnus Karlsson wrote:
>> On Fri, 30 Jun 2023 at 16:58, Ilya Maximets <i.maximets@xxxxxxx> wrote:
>>>
>>> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability.
>>> A privileged process might create the socket and pass it to a
>>> non-privileged process for later use.  However, that process will be
>>> able to bind the socket to any network interface.  Even though it will
>>> not be able to receive any traffic without modification of the BPF map,
>>> the situation is not ideal.
>>>
>>> Sockets already have a mechanism that can be used to restrict what
>>> interface they can be attached to.  That is SO_BINDTODEVICE.
>>>
>>> To change the binding the process will need CAP_NET_RAW.
>>>
>>> Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer
>>> workflow when non-privileged process is using AF_XDP.
>>
>> Rebinding an AF_XDP socket is not allowed today. Any such attempt will
>> return an error from bind. So if I understand the purpose of
>> SO_BINDTODEVICE correctly, you could say that this option is always
>> set for an AF_XDP socket and it is not possible to toggle it. The only
>> way to "rebind" an AF_XDP socket is to close it and open a new one.
>> This was a conscious design decision from day one as it would be very
>> hard to support this, especially in zero-copy mode.
> 
> Hi, Magnus.
> 
> The purpose of this patch is not to allow re-binding.  The use case is
> following:
> 
> 1. First process creates a bare socket with socket(AF_XDP, ...).
> 2. First process loads the XSK program to the interface.
> 3. First process adds the socket fd to a BPF map.
> 4. First process sends socket fd to a second process.
> 5. Second process allocates UMEM.
> 6. Second process binds socket to the interface.

7. Second process sends/receives the traffic. :)

> 
> The idea is that the first process will call SO_BINDTODEVICE before
> sending socket fd to a second process, so the second process is limited
> in to which interface it can bind the socket.
> 
> Does that make sense?
> 
> This workflow allows the second process to have no capabilities
> as long as it has sufficient RLIMIT_MEMLOCK.

Note that steps 1-7 are working just fine today.  i.e. the umem
registration, bind, ring mapping and traffic send/receive do not
require any extra capabilities.

We may restrict the bind() call to require CAP_NET_RAW and then
allow it for sockets that had SO_BINDTODEVICE as an alternative.
But restriction will break the current uAPI.

> 
> Best regards, Ilya Maximets.
> 
>>
>>> Signed-off-by: Ilya Maximets <i.maximets@xxxxxxx>
>>> ---
>>>
>>> Posting as an RFC for now to probably get some feedback.
>>> Will re-post once the tree is open.
>>>
>>>  Documentation/networking/af_xdp.rst | 9 +++++++++
>>>  net/xdp/xsk.c                       | 6 ++++++
>>>  2 files changed, 15 insertions(+)
>>>
>>> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
>>> index 247c6c4127e9..1cc35de336a4 100644
>>> --- a/Documentation/networking/af_xdp.rst
>>> +++ b/Documentation/networking/af_xdp.rst
>>> @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the
>>>  application to use. The final option is the flags field, but it will
>>>  be dealt with in separate sections for each UMEM flag.
>>>
>>> +SO_BINDTODEVICE setsockopt
>>> +--------------------------
>>> +
>>> +This is a generic SOL_SOCKET option that can be used to tie AF_XDP
>>> +socket to a particular network interface.  It is useful when a socket
>>> +is created by a privileged process and passed to a non-privileged one.
>>> +Once the option is set, kernel will refuse attempts to bind that socket
>>> +to a different interface.  Updating the value requires CAP_NET_RAW.
>>> +
>>>  XDP_STATISTICS getsockopt
>>>  -------------------------
>>>
>>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>>> index 5a8c0dd250af..386ff641db0f 100644
>>> --- a/net/xdp/xsk.c
>>> +++ b/net/xdp/xsk.c
>>> @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>>>         struct sock *sk = sock->sk;
>>>         struct xdp_sock *xs = xdp_sk(sk);
>>>         struct net_device *dev;
>>> +       int bound_dev_if;
>>>         u32 flags, qid;
>>>         int err = 0;
>>>
>>> @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
>>>                       XDP_USE_NEED_WAKEUP))
>>>                 return -EINVAL;
>>>
>>> +       bound_dev_if = READ_ONCE(sk->sk_bound_dev_if);
>>> +
>>> +       if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex)
>>> +               return -EINVAL;
>>> +
>>>         rtnl_lock();
>>>         mutex_lock(&xs->mutex);
>>>         if (xs->state != XSK_READY) {
>>> --
>>> 2.40.1
>>>
>>>
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux