Re: [PATCH net-next v19 03/13] netdev: support binding dma-buf to netdevice

Taehee Yoo <ap420073@xxxxxxxxx> · Thu, 22 Aug 2024 02:04:17 +0900

On Thu, Aug 22, 2024 at 12:36 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
>
> On Wed, Aug 21, 2024 at 5:15 AM Taehee Yoo <ap420073@xxxxxxxxx> wrote:
> >
> > On Tue, Aug 20, 2024 at 1:01 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Aug 19, 2024 at 6:53 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, 19 Aug 2024 00:44:27 +0900 Taehee Yoo wrote:
> > > > > > @@ -9537,6 +9540,10 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
> > > > > >                         NL_SET_ERR_MSG(extack, "Native and generic XDP can't be active at the same time");
> > > > > >                         return -EEXIST;
> > > > > >                 }
> > > > > > +               if (dev_get_max_mp_channel(dev) != -1) {
> > > > > > +                       NL_SET_ERR_MSG(extack, "XDP can't be installed on a netdev using memory providers");
> > > > > > +                       return -EINVAL;
> > > > > > +               }
> > > > >
> > > > > Should we consider virtual interfaces like bonding, bridge, etc?
> > > > > Virtual interfaces as an upper interface of physical interfaces can
> > > > > still install XDP prog.
> > > > >
> > > > > # ip link add bond0 type bond
> > > > > # ip link set eth0 master bond0
> > > > > # ip link set bond0 xdp pin /sys/fs/bpf/x/y
> > > > > and
> > > > > # ip link set bond0 xdpgeneric pin /sys/fs/bpf/x/y
> > > > >
> > > > > All virtual interfaces can install generic XDP prog.
> > > > > The bonding interface can install native XDP prog.
> > > >
> > > > Good point. We may need some common helpers to place the checks for XDP.
> > > > They are spread all over the place now.
> > >
> > > Took a bit of a look here. Forgive me, I'm not that familiar with XDP
> > > and virtual interfaces, so I'm a bit unsure what to do here.
> > >
> > > For veth, it seems, the device behind the veth is stored in
> > > veth_priv->peer, so it seems maybe a dev_get_max_mp_channel() check on
> > > veth_priv->peer is the way to go to disable this for veth? I think we
> > > need to do this check on creation of the veth and on the ndo_bpf of
> > > veth.
> > >
> > > For bonding, it seems we need to add mp channel check in bond_xdp_set,
> > > and bond_enslave?
> > >
> > > There are a few other drivers that define ndo_add_slave, seems a check
> > > in br_add_slave is needed as well.
> > >
> > > This seems like a potentially deep rabbit hole with a few checks to
> > > add all of the place. Is this blocking the series? AFAICT if XDP fails
> > > with mp-bound queues with a benign error, that seems fine to me; I
> > > don't have a use case for memory providers + xdp yet. This should only
> > > be blocking if someone can repro a very serious error (kernel crash)
> > > or something with this combination.
> > >
> > > I can try to add these checks locally and propose as a follow up
> > > series. Let me know if I'm on the right track with figuring out how to
> > > implement this, and, if you feel like it's blocking.
> > >
> > > --
> > > Thanks,
> > > Mina
> >
> > I agree with the current approach, which uses the
> > dev_get_min_mp_channel_count() in the dev_xdp_attach().
> > The only problem that I am concerned about is the
> > dev_get_min_mp_channel_count() can't check lower interfaces.
> > So, how about just making the current code to be able to check lower
> > interfaces?
>
> Thank you for the code snippet! It's very useful! I have been
> wondering how to walk lower/upper devices!
>
> To be honest, I think maybe Jakub's suggestion to refactor all the
> ->ndo_bpf calls needs to happen anyway. The reason is that there are
> ->ndo_bpf calls in the core net stack, like net/xdp/xsk_buff_pool.c
> and kernel/bpf/offload.c. AFAICT we need to add checks in these places
> as well, so refactoring them into one place is nice?
>
> Note I sent the refactor for review. Sorry, I forgot to CC Taehee:
> https://patchwork.kernel.org/project/netdevbpf/patch/20240821045629.2856641-1-almasrymina@xxxxxxxxxx/
>

I agree that it requires refactoring.
The dev_xdp_propagate() will be useful.

> Additionally I'm wondering if we should disable adding mp-bound
> devices as slaves completely, regardless of xdp. My concern is that if
> the lower device is using unreadable memory, then the upper device may
> see unreadable memory in its code paths, and will not be expecting
> that, so it may break. From the look at the code, it looks like
> net/batman-adv calls ndo_add_slave, and a bunch of code that touches
> skb_frags:
>
> $ ackc -i ndo_add_slave
> soft-interface.c
> 889:    .ndo_add_slave = batadv_softif_slave_add,
>
> $ ackc -i skb_frag
> fragmentation.c
> 403:    struct sk_buff *skb_fragment;
> 407:    skb_fragment = dev_alloc_skb(ll_reserved + mtu + tailroom);
> 408:    if (!skb_fragment)
> 411:    skb_fragment->priority = skb->priority;
> 414:    skb_reserve(skb_fragment, ll_reserved + header_size);
> 415:    skb_split(skb, skb_fragment, skb->len - fragment_size);
> 418:    skb_push(skb_fragment, header_size);
> 419:    memcpy(skb_fragment->data, frag_head, header_size);
> 422:    return skb_fragment;
> 441:    struct sk_buff *skb_fragment;
> 513:            skb_fragment = batadv_frag_create(net_dev, skb, &frag_header,
> 515:            if (!skb_fragment) {
> 522:                               skb_fragment->len + ETH_HLEN);
> 523:            ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
>
> If we disable ndo_add_slave on mp devices, then we don't need to walk
> lower or upper devices. What do you think? If we don't disable mp
> lower devices entirely, then yes, we can make
> dev_get_min_mp_channel_count() do a recursive check.
>
> Note that we can add support for mp bound devices as slaves in the
> future if we have a use case for it, and it's well tested to be safe
> with selftests added.
>
> If we disable adding mp devices as lower devices, then during the mp
> binding we should also check if the device has upper devices.

I truly agree with this idea!
Almost all virtual interfaces as an upper interface of mp_bound devices
especially tunneling interfaces will not work.
As you already know there are several reasons.
1. HDS wouldn't work due to tunneling header.
2. RSS wouldn't work due to tunneling header.
So, I agree that we disable setting up virtual interfaces as an
upper interface of mp_bound devices.
Then as you said, we can allow only confirmed interface types
in the future.

The IPsec is also not working with mp_bound devices due to the same
reason. It would be a more complex issue, unfortunately, I don't know
how to deal with it.