Re: Redirect to AF_XDP socket not working with bond interface in native mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, I found the place where the packet is getting dropped. The check
for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
The device in xs is the bond device whereas the one in xdp->rxq is the
slave device on which the packet is received and the xdp program is
being invoked from.

static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
{
--
    if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
        return -EINVAL;
--
}
Here is the perf backtrace for the xdp_redirect_err event.
ksoftirqd/0    14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
        ffffffff873dcbf4 xdp_do_redirect+0x3b4
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff873dcbf4 xdp_do_redirect+0x3b4
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffffc05d0f0f ixgbe_run_xdp+0x10f
(/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
        ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
(/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
        ffffffffc05d2da0 ixgbe_poll+0xf0
(/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
        ffffffff873afad7 __napi_poll+0x27
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff873affd3 net_rx_action+0x233
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff8762ae27 __do_softirq+0xc7
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff86b04cfe run_ksoftirqd+0x1e
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff86b33d83 smpboot_thread_fn+0xd3
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff86b2956d kthread+0xdd
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
        ffffffff86a02289 ret_from_fork+0x29
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)

I am curious why the xdp program is invoked from the ixgbe driver
(running for slave device) when the xdp program is actually attached
to the bond device? Is this by design?
# xdp-loader status bond0
CURRENT XDP PROGRAM STATUS:
Interface        Prio  Program name      Mode     ID   Tag
  Chain actions
--------------------------------------------------------------------------------------
bond0                  xdp_dispatcher    native   64   90f686eb86991928
 =>              20     xsk_def_prog              73
8f9c40757cb0a6a2  XDP_PASS

# xdp-loader status ens1f0
CURRENT XDP PROGRAM STATUS:
Interface        Prio  Program name      Mode     ID   Tag
  Chain actions
--------------------------------------------------------------------------------------
ens1f0                 <No XDP program loaded!>

# xdp-loader status ens1f1
CURRENT XDP PROGRAM STATUS:
Interface        Prio  Program name      Mode     ID   Tag
  Chain actions
--------------------------------------------------------------------------------------
ens1f1                 <No XDP program loaded!>

Now, if I skip the device check in xsk_rcv_check(), I can see the
packets being received in the AF_XDP socket in the driver mode.
 # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
 sock0@bond0:5 rxdrop xdp-drv poll()
                   pps            pkts           1.00
rx                10,126,924     1,984,092,501
tx                 0              0

I am sure we would not want to skip the device check generally
especially for non-bonded devices, etc. Please guide on how to take
this further and get the issue fixed in the mainline.

The ZC mode doesn't work. Mostly because of the problem you had
pointed out before.
# ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"


On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:
>
> On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> >
> > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > <magnus.karlsson@xxxxxxxxx> wrote:
> > >
> > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > >
> > > > Thanks for your response. My comments inline.
> > > >
> > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > <magnus.karlsson@xxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > packets from the bond interface.
> > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > input interface. However the packets received on the bond interface
> > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > Note that the socket creation does succeed.
> > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > great.
> > > > > >
> > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > >
> > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > you using?
> > > > >
> > > > I will trace the packet and get back.
> > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > SFI/SFP+ Network Connection (rev 01)
> > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > SFI/SFP+ Network Connection (rev 01)
> > > >
> > > > Bonding algo is 802.3ad
> > > >
> > > > CPU is Intel Xeon Gold 3.40GHz
> > > >
> > > > NIC Driver
> > > > # ethtool -i ens1f0
> > > > driver: ixgbe
> > > > version: 5.14.0-362.13.1.el9_3
> > >
> > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > a lot of things have happened since then.
> > >
> > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
>
> OK, good to know. Have you managed to trace where the packet is lost?
>
> > > > Features
> > > > # xdp-loader features ens1f0
> > > > NETDEV_XDP_ACT_BASIC:           yes
> > > > NETDEV_XDP_ACT_REDIRECT:        yes
> > > > NETDEV_XDP_ACT_NDO_XMIT:        no
> > > > NETDEV_XDP_ACT_XSK_ZEROCOPY:    yes
> > > > NETDEV_XDP_ACT_HW_OFFLOAD:      no
> > > > NETDEV_XDP_ACT_RX_SG:           no
> > > > NETDEV_XDP_ACT_NDO_XMIT_SG:     no
> > > >
> > > > CPU is
> > > >
> > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > mode. That's because the features are copied from the slave device.
> > > > Which explains why there is no error while binding the socket in
> > > > native/zero-copy mode.
> > >
> > > It is probably the intention that if both the bonded devices support a
> > > feature, then the bonding device will too. I just saw that the bonding
> > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > zero-copy is not really supported so that support should not be
> > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > >
> > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > >     err = -EOPNOTSUPP;
> > >     goto err_unreg_pool;
> > > }
> > >
> > > So there are some things needed in the bonding driver to make
> > > zero-copy work. Might not be much though. But your problem is with
> > > XDP_DRV and copy mode, so let us start there.
> > >
> > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > {
> > > > ..
> > > >     bond_for_each_slave(bond, slave, iter)
> > > >         val &= slave->dev->xdp_features;
> > > >     xdp_set_features_flag(bond_dev, val);
> > > > }
> > > >
> > > > # ../xdp-loader/xdp-loader features bond0
> > > > NETDEV_XDP_ACT_BASIC:           yes
> > > > NETDEV_XDP_ACT_REDIRECT:        yes
> > > > NETDEV_XDP_ACT_NDO_XMIT:        no
> > > > NETDEV_XDP_ACT_XSK_ZEROCOPY:    yes
> > > > NETDEV_XDP_ACT_HW_OFFLOAD:      no
> > > > NETDEV_XDP_ACT_RX_SG:           no
> > > > NETDEV_XDP_ACT_NDO_XMIT_SG:     no
> > > >
> > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > are not allocating skb?
> > > > >
> > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > The packet has to be copied out to user-space then copied into the
> > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > not an option.
> > > > >
> > > >
> > > > Just to put forth the pps numbers with the above mentioned single port
> > > > in different modes and a comparison to the bond interface.
> > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > >
> > > > Single AF_XDP sock on a single NIC queue-
> > > >   AF_XDP rxdrop        PPS    CPU-SI*   CPU-xdpsock   Command
> > > >  ══════════════════════════════════════════════════════════
> > > >   ZC                            14M      65%        35%
> > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > >   XDP_DRV/COPY     10M     100%       23%                ./xdpsock -r
> > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > >   SKB_MODE            2.2M     100%       62%                ./xdpsock
> > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > * CPU receiving the packet
> > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > run BPF in offloaded mode so I cannot compare it.
> > >
> > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > do the redirect in the CPU for it to work. If you want to know where
> > > time is spent use "perf top". The biggest chunk of time is spent in
> > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > >
> > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > >   xdp-bench                PPS       CPU-SI*   Command
> > > >  ═══════════════════════════════════════════════
> > > >   drop, no-touch         14M           41%      ./xdp-bench drop -p
> > > > no-touch ens1f0 -e
> > > >   drop, read-data        14M           55%      ./xdp-bench drop -p
> > > > read-data ens1f0 -e
> > > >   drop, parse-ip          14M           58%      ./xdp-bench drop -p
> > > > parse-ip ens1f0 -e
> > > > * CPU receiving the packet
> > > >
> > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > >  AF_XDP rxdrop       PPS   CPU-SI*   CPU-xdpsock   Command
> > > >  ══════════════════════════════════════════════════════════
> > > >   ZC                           X         X              X
> > > >       ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > >   XDP_DRV/COPY    X         X              X
> > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > >   SKB_MODE            2M      100%        55%              ./xdpsock
> > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > * CPU receiving the packet
> > > >
> > > >   xdp-bench            PPS     CPU-SI*   Command
> > > >  ═══════════════════════════════════════════════
> > > >   drop, no-touch     10.9M    33%         ./xdp-bench drop -p no-touch
> > > > bond0 -e
> > > >   drop, read-data    10.9M    44%         ./xdp-bench drop -p
> > > > read-data bond0 -e
> > > >   drop, parse-ip       10.9M   47%         ./xdp-bench drop -p
> > > > parse-ip bond0 -e
> > > > * CPU receiving the packet
> > > >
> > > >
> > > > > > Kindly share your thoughts and advice.
> > > > > >
> > > > > > Thanks,
> > > > > > Prashant
> > > > > >





[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux