Re: Redirect to AF_XDP socket not working with bond interface in native mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 16, 2024 at 6:29 PM Magnus Karlsson
<magnus.karlsson@xxxxxxxxx> wrote:
>
> On Tue, 16 Jan 2024 at 13:48, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> >
> > On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson
> > <magnus.karlsson@xxxxxxxxx> wrote:
> > >
> > > On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> > > > <magnus.karlsson@xxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > > > >
> > > > > > Yes, I found the place where the packet is getting dropped. The check
> > > > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > > > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > > > > slave device on which the packet is received and the xdp program is
> > > > > > being invoked from.
> > > > > >
> > > > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > > > > {
> > > > > > --
> > > > > >     if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > > > >         return -EINVAL;
> > > > > > --
> > > > > > }
> > > > >
> > > > > I am now back from the holidays.
> > > > >
> > > > > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > > > > and we will get back to you with a solution proposal.
> > > > >
> > > > Thanks, will wait for your solution.
> > >
> > > FYI, I do not have a good solution for this yet. The one I have is too
> > > complicated for my taste. I might have to take this to the list to get
> > > some new ideas on how to tackle it. So this will take longer than
> > > anticipated.
> > >
> > Just to add that the AF_XDP TX in native mode is also not performing
> > well. I am getting around 2Mpps in native mode.
>
> That is expected though. There are only two modes for Tx: SKB mode and
> zero-copy mode, and since there is no zero-copy support for the
> bonding driver, it will revert to skb mode. I would expect around 3
> Mpps for Tx in skb mode, so 2 Mpps seems reasonable as the bonding
> driver adds overhead.
>
> For Rx there are 3 modes: skb, XDP_DRV (which is the one you are
> getting with the -N switch) and zero-copy (that is not supported by
> the bonding driver).
>
Thanks for quick info. So, when you provide the fix for the bond
driver, can we expect the bond-driver to be able to support ZC in the
Tx mode (and Rx mode) or will the Tx remain in SKB mode? At 2M pps,
it's a big gap in Rx and Tx and practically leaves xdp not much useful
with bond devices.

I also see a gap in Rx vs Tx for veth drivers-
In this below topology, I see AF_XDP TX to a veth device (veth1) is
not going beyond 1.2Mpps, The xdp program on veth2 redirects packet to
phy device ens1f0. I would assume based on your explanation above,
that this too is working in SKB mode, and hence the lower performance.
veth1 (AF_XDP Tx) -> veth2 (xdp) -> ens1f0

However in the reverse direction shown below, I can receive close to
5M pps on AF_XDP socket.
ens1f0 (xdp) ->veth2 -> veth1 (AF_XDP Rx)

Looking at the results here-
https://patchwork.ozlabs.org/project/netdev/cover/1533283098-2397-1-git-send-email-makita.toshiaki@xxxxxxxxxxxxx/
, I don't seem to find the benchmark which would validate my AF_XDP Rx
and Tx results with veth devices. The xdp DROP test results do match
with my tests though.

> >  # ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c
> >  sock0@bond0:0 txonly xdp-drv
> >
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 2,520,587      2,521,152
> >
> >  sock0@bond0:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 2,362,740      4,884,352
> >
> >  sock0@bond0:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 1,814,437      6,698,944
> >
> >  sock0@bond0:0 txonly xdp-drv
> >                    pps            pkts           1.00
> > rx                 0              0
> > tx                 1,817,913      8,517,120
> >
> > # xdp-loader status bond0
> > CURRENT XDP PROGRAM STATUS:
> >
> > Interface        Prio  Program name      Mode     ID   Tag
> >   Chain actions
> > --------------------------------------------------------------------------------------
> > bond0                  xdp_dispatcher    native   671  90f686eb86991928
> >  =>              20     xsk_def_prog              680
> > 8f9c40757cb0a6a2  XDP_PASS
> >
> > > > > > Here is the perf backtrace for the xdp_redirect_err event.
> > > > > > ksoftirqd/0    14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > > > >         ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > >         ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > >         ffffffffc05d2da0 ixgbe_poll+0xf0
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > >         ffffffff873afad7 __napi_poll+0x27
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff873affd3 net_rx_action+0x233
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff8762ae27 __do_softirq+0xc7
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff86b04cfe run_ksoftirqd+0x1e
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff86b2956d kthread+0xdd
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >         ffffffff86a02289 ret_from_fork+0x29
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >
> > > > > > I am curious why the xdp program is invoked from the ixgbe driver
> > > > > > (running for slave device) when the xdp program is actually attached
> > > > > > to the bond device? Is this by design?
> > > > > > # xdp-loader status bond0
> > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > Interface        Prio  Program name      Mode     ID   Tag
> > > > > >   Chain actions
> > > > > > --------------------------------------------------------------------------------------
> > > > > > bond0                  xdp_dispatcher    native   64   90f686eb86991928
> > > > > >  =>              20     xsk_def_prog              73
> > > > > > 8f9c40757cb0a6a2  XDP_PASS
> > > > > >
> > > > > > # xdp-loader status ens1f0
> > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > Interface        Prio  Program name      Mode     ID   Tag
> > > > > >   Chain actions
> > > > > > --------------------------------------------------------------------------------------
> > > > > > ens1f0                 <No XDP program loaded!>
> > > > > >
> > > > > > # xdp-loader status ens1f1
> > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > Interface        Prio  Program name      Mode     ID   Tag
> > > > > >   Chain actions
> > > > > > --------------------------------------------------------------------------------------
> > > > > > ens1f1                 <No XDP program loaded!>
> > > > > >
> > > > > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > > > > packets being received in the AF_XDP socket in the driver mode.
> > > > > >  # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > > > >  sock0@bond0:5 rxdrop xdp-drv poll()
> > > > > >                    pps            pkts           1.00
> > > > > > rx                10,126,924     1,984,092,501
> > > > > > tx                 0              0
> > > > > >
> > > > > > I am sure we would not want to skip the device check generally
> > > > > > especially for non-bonded devices, etc. Please guide on how to take
> > > > > > this further and get the issue fixed in the mainline.
> > > > > >
> > > > > > The ZC mode doesn't work. Mostly because of the problem you had
> > > > > > pointed out before.
> > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > > > > <magnus.karlsson@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > > > > <magnus.karlsson@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks for your response. My comments inline.
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > > > > <magnus.karlsson@xxxxxxxxx> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > > > > packets from the bond interface.
> > > > > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > > > > great.
> > > > > > > > > > > >
> > > > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > > > > >
> > > > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > > > > you using?
> > > > > > > > > > >
> > > > > > > > > > I will trace the packet and get back.
> > > > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > >
> > > > > > > > > > Bonding algo is 802.3ad
> > > > > > > > > >
> > > > > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > > > > >
> > > > > > > > > > NIC Driver
> > > > > > > > > > # ethtool -i ens1f0
> > > > > > > > > > driver: ixgbe
> > > > > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > > > > >
> > > > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > > > > a lot of things have happened since then.
> > > > > > > > >
> > > > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > > > > >
> > > > > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > > > > >
> > > > > > > > > > Features
> > > > > > > > > > # xdp-loader features ens1f0
> > > > > > > > > > NETDEV_XDP_ACT_BASIC:           yes
> > > > > > > > > > NETDEV_XDP_ACT_REDIRECT:        yes
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT:        no
> > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY:    yes
> > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD:      no
> > > > > > > > > > NETDEV_XDP_ACT_RX_SG:           no
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG:     no
> > > > > > > > > >
> > > > > > > > > > CPU is
> > > > > > > > > >
> > > > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > > > > native/zero-copy mode.
> > > > > > > > >
> > > > > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > > > > zero-copy is not really supported so that support should not be
> > > > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > > > > >
> > > > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > > > >     err = -EOPNOTSUPP;
> > > > > > > > >     goto err_unreg_pool;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > So there are some things needed in the bonding driver to make
> > > > > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > > > > >
> > > > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > > > > {
> > > > > > > > > > ..
> > > > > > > > > >     bond_for_each_slave(bond, slave, iter)
> > > > > > > > > >         val &= slave->dev->xdp_features;
> > > > > > > > > >     xdp_set_features_flag(bond_dev, val);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > > > > NETDEV_XDP_ACT_BASIC:           yes
> > > > > > > > > > NETDEV_XDP_ACT_REDIRECT:        yes
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT:        no
> > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY:    yes
> > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD:      no
> > > > > > > > > > NETDEV_XDP_ACT_RX_SG:           no
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG:     no
> > > > > > > > > >
> > > > > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > > > > are not allocating skb?
> > > > > > > > > > >
> > > > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > > > > not an option.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > > > > >
> > > > > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > > > >   AF_XDP rxdrop        PPS    CPU-SI*   CPU-xdpsock   Command
> > > > > > > > > >  ══════════════════════════════════════════════════════════
> > > > > > > > > >   ZC                            14M      65%        35%
> > > > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > > > >   XDP_DRV/COPY     10M     100%       23%                ./xdpsock -r
> > > > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > > > >   SKB_MODE            2.2M     100%       62%                ./xdpsock
> > > > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > > > > >
> > > > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > > > > >
> > > > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > > > >   xdp-bench                PPS       CPU-SI*   Command
> > > > > > > > > >  ═══════════════════════════════════════════════
> > > > > > > > > >   drop, no-touch         14M           41%      ./xdp-bench drop -p
> > > > > > > > > > no-touch ens1f0 -e
> > > > > > > > > >   drop, read-data        14M           55%      ./xdp-bench drop -p
> > > > > > > > > > read-data ens1f0 -e
> > > > > > > > > >   drop, parse-ip          14M           58%      ./xdp-bench drop -p
> > > > > > > > > > parse-ip ens1f0 -e
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > >
> > > > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > > > >  AF_XDP rxdrop       PPS   CPU-SI*   CPU-xdpsock   Command
> > > > > > > > > >  ══════════════════════════════════════════════════════════
> > > > > > > > > >   ZC                           X         X              X
> > > > > > > > > >       ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > > > >   XDP_DRV/COPY    X         X              X
> > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > > > >   SKB_MODE            2M      100%        55%              ./xdpsock
> > > > > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > >
> > > > > > > > > >   xdp-bench            PPS     CPU-SI*   Command
> > > > > > > > > >  ═══════════════════════════════════════════════
> > > > > > > > > >   drop, no-touch     10.9M    33%         ./xdp-bench drop -p no-touch
> > > > > > > > > > bond0 -e
> > > > > > > > > >   drop, read-data    10.9M    44%         ./xdp-bench drop -p
> > > > > > > > > > read-data bond0 -e
> > > > > > > > > >   drop, parse-ip       10.9M   47%         ./xdp-bench drop -p
> > > > > > > > > > parse-ip bond0 -e
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prashant
> > > > > > > > > > > >





[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux