Re: Redirect to AF_XDP socket not working with bond interface in native mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:
>
> On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> >
> > Thanks for your response. My comments inline.
> >
> > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > <magnus.karlsson@xxxxxxxxx> wrote:
> > >
> > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I am new to XDP and exploring it's working with different interface
> > > > types supported in linux. One of my use cases is to be able to receive
> > > > packets from the bond interface.
> > > > I used xdpsock sample program specifying the bond interface as the
> > > > input interface. However the packets received on the bond interface
> > > > are not handed over to the socket by the kernel if the socket is bound
> > > > in native mode. The packets are neither being passed to the kernel.
> > > > Note that the socket creation does succeed.
> > > > In skb mode this works and I am able to receive packets in the
> > > > userspace. But in skb mode as expected the performance is not that
> > > > great.
> > > >
> > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > packet has be to be handed over to the bond driver post reception on
> > > > the phy port, a skb allocation and copy to it is indeed a must?
> > >
> > > I have never tried a bonding interface with AF_XDP, so it might not
> > > work. Can you trace the packet to see where it is being dropped in
> > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > you using?
> > >
> > I will trace the packet and get back.
> > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> >
> > Bonding algo is 802.3ad
> >
> > CPU is Intel Xeon Gold 3.40GHz
> >
> > NIC Driver
> > # ethtool -i ens1f0
> > driver: ixgbe
> > version: 5.14.0-362.13.1.el9_3
>
> Could you please try with the latest kernel 6.7? 5.14 is quite old and
> a lot of things have happened since then.
>
I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > Features
> > # xdp-loader features ens1f0
> > NETDEV_XDP_ACT_BASIC:           yes
> > NETDEV_XDP_ACT_REDIRECT:        yes
> > NETDEV_XDP_ACT_NDO_XMIT:        no
> > NETDEV_XDP_ACT_XSK_ZEROCOPY:    yes
> > NETDEV_XDP_ACT_HW_OFFLOAD:      no
> > NETDEV_XDP_ACT_RX_SG:           no
> > NETDEV_XDP_ACT_NDO_XMIT_SG:     no
> >
> > CPU is
> >
> > Interesting thing is that the bond0 does advertise both native and ZC
> > mode. That's because the features are copied from the slave device.
> > Which explains why there is no error while binding the socket in
> > native/zero-copy mode.
>
> It is probably the intention that if both the bonded devices support a
> feature, then the bonding device will too. I just saw that the bonding
> device did not implement xsk_wakeup which is used by zero-copy, so
> zero-copy is not really supported so that support should not be
> advertised. The code in AF_XDP tests for zero-copy support this way:
>
> if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
>     err = -EOPNOTSUPP;
>     goto err_unreg_pool;
> }
>
> So there are some things needed in the bonding driver to make
> zero-copy work. Might not be much though. But your problem is with
> XDP_DRV and copy mode, so let us start there.
>
> > void bond_xdp_set_features(struct net_device *bond_dev)
> > {
> > ..
> >     bond_for_each_slave(bond, slave, iter)
> >         val &= slave->dev->xdp_features;
> >     xdp_set_features_flag(bond_dev, val);
> > }
> >
> > # ../xdp-loader/xdp-loader features bond0
> > NETDEV_XDP_ACT_BASIC:           yes
> > NETDEV_XDP_ACT_REDIRECT:        yes
> > NETDEV_XDP_ACT_NDO_XMIT:        no
> > NETDEV_XDP_ACT_XSK_ZEROCOPY:    yes
> > NETDEV_XDP_ACT_HW_OFFLOAD:      no
> > NETDEV_XDP_ACT_RX_SG:           no
> > NETDEV_XDP_ACT_NDO_XMIT_SG:     no
> >
> > > > Another thing I notice is that other XDP programs attached to bond
> > > > interface with targets like DROP, REDIRECT to other interface works
> > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > are not allocating skb?
> > >
> > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > The packet has to be copied out to user-space then copied into the
> > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > If you were using zero-copy, on the other hand, it would be faster
> > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > not an option.
> > >
> >
> > Just to put forth the pps numbers with the above mentioned single port
> > in different modes and a comparison to the bond interface.
> > Test is using pktgen pumping 64 byte packets on a single flow.
> >
> > Single AF_XDP sock on a single NIC queue-
> >   AF_XDP rxdrop        PPS    CPU-SI*   CPU-xdpsock   Command
> >  ══════════════════════════════════════════════════════════
> >   ZC                            14M      65%        35%
> > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> >   XDP_DRV/COPY     10M     100%       23%                ./xdpsock -r
> > -i ens1f0 -q 5 -p -n 1 -N -c
> >   SKB_MODE            2.2M     100%       62%                ./xdpsock
> > -r -i ens1f0 -q 5 -p -n 1 -S
> > * CPU receiving the packet
> > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > expected? Especially in ZC mode. Is it majorly because of the BPF
> > program running in non-HW offloaded mode? Don't have a NIC which can
> > run BPF in offloaded mode so I cannot compare it.
>
> I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> do the redirect in the CPU for it to work. If you want to know where
> time is spent use "perf top". The biggest chunk of time is spent in
> the XDP_REDIRECT operation, but there are many other time thiefs too.
>
> > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> >   xdp-bench                PPS       CPU-SI*   Command
> >  ═══════════════════════════════════════════════
> >   drop, no-touch         14M           41%      ./xdp-bench drop -p
> > no-touch ens1f0 -e
> >   drop, read-data        14M           55%      ./xdp-bench drop -p
> > read-data ens1f0 -e
> >   drop, parse-ip          14M           58%      ./xdp-bench drop -p
> > parse-ip ens1f0 -e
> > * CPU receiving the packet
> >
> > The similar tests on bond interface (above mentioned 2 ports bonded)-
> >  AF_XDP rxdrop       PPS   CPU-SI*   CPU-xdpsock   Command
> >  ══════════════════════════════════════════════════════════
> >   ZC                           X         X              X
> >       ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> >   XDP_DRV/COPY    X         X              X
> > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> >   SKB_MODE            2M      100%        55%              ./xdpsock
> > -r -i bond0 -q 0 -p -n 1 -S
> > * CPU receiving the packet
> >
> >   xdp-bench            PPS     CPU-SI*   Command
> >  ═══════════════════════════════════════════════
> >   drop, no-touch     10.9M    33%         ./xdp-bench drop -p no-touch
> > bond0 -e
> >   drop, read-data    10.9M    44%         ./xdp-bench drop -p
> > read-data bond0 -e
> >   drop, parse-ip       10.9M   47%         ./xdp-bench drop -p
> > parse-ip bond0 -e
> > * CPU receiving the packet
> >
> >
> > > > Kindly share your thoughts and advice.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >





[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux