On Tue, Jan 16, 2024 at 6:29 PM Magnus Karlsson <magnus.karlsson@xxxxxxxxx> wrote: > > On Tue, 16 Jan 2024 at 13:48, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote: > > > > On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote: > > > > > > > > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote: > > > > > > > > > > > > Yes, I found the place where the packet is getting dropped. The check > > > > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() . > > > > > > The device in xs is the bond device whereas the one in xdp->rxq is the > > > > > > slave device on which the packet is received and the xdp program is > > > > > > being invoked from. > > > > > > > > > > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp) > > > > > > { > > > > > > -- > > > > > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index) > > > > > > return -EINVAL; > > > > > > -- > > > > > > } > > > > > > > > > > I am now back from the holidays. > > > > > > > > > > Perfect! Thank you for finding the root cause. I will rope in Maciej > > > > > and we will get back to you with a solution proposal. > > > > > > > > > Thanks, will wait for your solution. > > > > > > FYI, I do not have a good solution for this yet. The one I have is too > > > complicated for my taste. I might have to take this to the list to get > > > some new ideas on how to tackle it. So this will take longer than > > > anticipated. > > > > > Just to add that the AF_XDP TX in native mode is also not performing > > well. I am getting around 2Mpps in native mode. > > That is expected though. There are only two modes for Tx: SKB mode and > zero-copy mode, and since there is no zero-copy support for the > bonding driver, it will revert to skb mode. I would expect around 3 > Mpps for Tx in skb mode, so 2 Mpps seems reasonable as the bonding > driver adds overhead. > > For Rx there are 3 modes: skb, XDP_DRV (which is the one you are > getting with the -N switch) and zero-copy (that is not supported by > the bonding driver). > Thanks for quick info. So, when you provide the fix for the bond driver, can we expect the bond-driver to be able to support ZC in the Tx mode (and Rx mode) or will the Tx remain in SKB mode? At 2M pps, it's a big gap in Rx and Tx and practically leaves xdp not much useful with bond devices. I also see a gap in Rx vs Tx for veth drivers- In this below topology, I see AF_XDP TX to a veth device (veth1) is not going beyond 1.2Mpps, The xdp program on veth2 redirects packet to phy device ens1f0. I would assume based on your explanation above, that this too is working in SKB mode, and hence the lower performance. veth1 (AF_XDP Tx) -> veth2 (xdp) -> ens1f0 However in the reverse direction shown below, I can receive close to 5M pps on AF_XDP socket. ens1f0 (xdp) ->veth2 -> veth1 (AF_XDP Rx) Looking at the results here- https://patchwork.ozlabs.org/project/netdev/cover/1533283098-2397-1-git-send-email-makita.toshiaki@xxxxxxxxxxxxx/ , I don't seem to find the benchmark which would validate my AF_XDP Rx and Tx results with veth devices. The xdp DROP test results do match with my tests though. > > # ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c > > sock0@bond0:0 txonly xdp-drv > > > > pps pkts 1.00 > > rx 0 0 > > tx 2,520,587 2,521,152 > > > > sock0@bond0:0 txonly xdp-drv > > pps pkts 1.00 > > rx 0 0 > > tx 2,362,740 4,884,352 > > > > sock0@bond0:0 txonly xdp-drv > > pps pkts 1.00 > > rx 0 0 > > tx 1,814,437 6,698,944 > > > > sock0@bond0:0 txonly xdp-drv > > pps pkts 1.00 > > rx 0 0 > > tx 1,817,913 8,517,120 > > > > # xdp-loader status bond0 > > CURRENT XDP PROGRAM STATUS: > > > > Interface Prio Program name Mode ID Tag > > Chain actions > > -------------------------------------------------------------------------------------- > > bond0 xdp_dispatcher native 671 90f686eb86991928 > > => 20 xsk_def_prog 680 > > 8f9c40757cb0a6a2 XDP_PASS > > > > > > > > Here is the perf backtrace for the xdp_redirect_err event. > > > > > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69 > > > > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5 > > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko) > > > > > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko) > > > > > > ffffffffc05d2da0 ixgbe_poll+0xf0 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko) > > > > > > ffffffff873afad7 __napi_poll+0x27 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff873affd3 net_rx_action+0x233 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff8762ae27 __do_softirq+0xc7 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff86b04cfe run_ksoftirqd+0x1e > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff86b33d83 smpboot_thread_fn+0xd3 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff86b2956d kthread+0xdd > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > ffffffff86a02289 ret_from_fork+0x29 > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux) > > > > > > > > > > > > I am curious why the xdp program is invoked from the ixgbe driver > > > > > > (running for slave device) when the xdp program is actually attached > > > > > > to the bond device? Is this by design? > > > > > > # xdp-loader status bond0 > > > > > > CURRENT XDP PROGRAM STATUS: > > > > > > Interface Prio Program name Mode ID Tag > > > > > > Chain actions > > > > > > -------------------------------------------------------------------------------------- > > > > > > bond0 xdp_dispatcher native 64 90f686eb86991928 > > > > > > => 20 xsk_def_prog 73 > > > > > > 8f9c40757cb0a6a2 XDP_PASS > > > > > > > > > > > > # xdp-loader status ens1f0 > > > > > > CURRENT XDP PROGRAM STATUS: > > > > > > Interface Prio Program name Mode ID Tag > > > > > > Chain actions > > > > > > -------------------------------------------------------------------------------------- > > > > > > ens1f0 <No XDP program loaded!> > > > > > > > > > > > > # xdp-loader status ens1f1 > > > > > > CURRENT XDP PROGRAM STATUS: > > > > > > Interface Prio Program name Mode ID Tag > > > > > > Chain actions > > > > > > -------------------------------------------------------------------------------------- > > > > > > ens1f1 <No XDP program loaded!> > > > > > > > > > > > > Now, if I skip the device check in xsk_rcv_check(), I can see the > > > > > > packets being received in the AF_XDP socket in the driver mode. > > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N > > > > > > sock0@bond0:5 rxdrop xdp-drv poll() > > > > > > pps pkts 1.00 > > > > > > rx 10,126,924 1,984,092,501 > > > > > > tx 0 0 > > > > > > > > > > > > I am sure we would not want to skip the device check generally > > > > > > especially for non-bonded devices, etc. Please guide on how to take > > > > > > this further and get the issue fixed in the mainline. > > > > > > > > > > > > The ZC mode doesn't work. Mostly because of the problem you had > > > > > > pointed out before. > > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z > > > > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument" > > > > > > > > > > > > > > > > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson > > > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson > > > > > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > Thanks for your response. My comments inline. > > > > > > > > > > > > > > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson > > > > > > > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > I am new to XDP and exploring it's working with different interface > > > > > > > > > > > > types supported in linux. One of my use cases is to be able to receive > > > > > > > > > > > > packets from the bond interface. > > > > > > > > > > > > I used xdpsock sample program specifying the bond interface as the > > > > > > > > > > > > input interface. However the packets received on the bond interface > > > > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound > > > > > > > > > > > > in native mode. The packets are neither being passed to the kernel. > > > > > > > > > > > > Note that the socket creation does succeed. > > > > > > > > > > > > In skb mode this works and I am able to receive packets in the > > > > > > > > > > > > userspace. But in skb mode as expected the performance is not that > > > > > > > > > > > > great. > > > > > > > > > > > > > > > > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the > > > > > > > > > > > > packet has be to be handed over to the bond driver post reception on > > > > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must? > > > > > > > > > > > > > > > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not > > > > > > > > > > > work. Can you trace the packet to see where it is being dropped in > > > > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT > > > > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are > > > > > > > > > > > you using? > > > > > > > > > > > > > > > > > > > > > I will trace the packet and get back. > > > > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are- > > > > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit > > > > > > > > > > SFI/SFP+ Network Connection (rev 01) > > > > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit > > > > > > > > > > SFI/SFP+ Network Connection (rev 01) > > > > > > > > > > > > > > > > > > > > Bonding algo is 802.3ad > > > > > > > > > > > > > > > > > > > > CPU is Intel Xeon Gold 3.40GHz > > > > > > > > > > > > > > > > > > > > NIC Driver > > > > > > > > > > # ethtool -i ens1f0 > > > > > > > > > > driver: ixgbe > > > > > > > > > > version: 5.14.0-362.13.1.el9_3 > > > > > > > > > > > > > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and > > > > > > > > > a lot of things have happened since then. > > > > > > > > > > > > > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue. > > > > > > > > > > > > > > OK, good to know. Have you managed to trace where the packet is lost? > > > > > > > > > > > > > > > > > Features > > > > > > > > > > # xdp-loader features ens1f0 > > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes > > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no > > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes > > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no > > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no > > > > > > > > > > > > > > > > > > > > CPU is > > > > > > > > > > > > > > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC > > > > > > > > > > mode. That's because the features are copied from the slave device. > > > > > > > > > > Which explains why there is no error while binding the socket in > > > > > > > > > > native/zero-copy mode. > > > > > > > > > > > > > > > > > > It is probably the intention that if both the bonded devices support a > > > > > > > > > feature, then the bonding device will too. I just saw that the bonding > > > > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so > > > > > > > > > zero-copy is not really supported so that support should not be > > > > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way: > > > > > > > > > > > > > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) { > > > > > > > > > err = -EOPNOTSUPP; > > > > > > > > > goto err_unreg_pool; > > > > > > > > > } > > > > > > > > > > > > > > > > > > So there are some things needed in the bonding driver to make > > > > > > > > > zero-copy work. Might not be much though. But your problem is with > > > > > > > > > XDP_DRV and copy mode, so let us start there. > > > > > > > > > > > > > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev) > > > > > > > > > > { > > > > > > > > > > .. > > > > > > > > > > bond_for_each_slave(bond, slave, iter) > > > > > > > > > > val &= slave->dev->xdp_features; > > > > > > > > > > xdp_set_features_flag(bond_dev, val); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > # ../xdp-loader/xdp-loader features bond0 > > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes > > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no > > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes > > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no > > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no > > > > > > > > > > > > > > > > > > > > > > Another thing I notice is that other XDP programs attached to bond > > > > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works > > > > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these > > > > > > > > > > > > are not allocating skb? > > > > > > > > > > > > > > > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT. > > > > > > > > > > > The packet has to be copied out to user-space then copied into the > > > > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case. > > > > > > > > > > > If you were using zero-copy, on the other hand, it would be faster > > > > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so > > > > > > > > > > > not an option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Just to put forth the pps numbers with the above mentioned single port > > > > > > > > > > in different modes and a comparison to the bond interface. > > > > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow. > > > > > > > > > > > > > > > > > > > > Single AF_XDP sock on a single NIC queue- > > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command > > > > > > > > > > ══════════════════════════════════════════════════════════ > > > > > > > > > > ZC 14M 65% 35% > > > > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z > > > > > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r > > > > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c > > > > > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock > > > > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S > > > > > > > > > > * CPU receiving the packet > > > > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as > > > > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF > > > > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can > > > > > > > > > > run BPF in offloaded mode so I cannot compare it. > > > > > > > > > > > > > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a > > > > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes, > > > > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to > > > > > > > > > do the redirect in the CPU for it to work. If you want to know where > > > > > > > > > time is spent use "perf top". The biggest chunk of time is spent in > > > > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too. > > > > > > > > > > > > > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port- > > > > > > > > > > xdp-bench PPS CPU-SI* Command > > > > > > > > > > ═══════════════════════════════════════════════ > > > > > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p > > > > > > > > > > no-touch ens1f0 -e > > > > > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p > > > > > > > > > > read-data ens1f0 -e > > > > > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p > > > > > > > > > > parse-ip ens1f0 -e > > > > > > > > > > * CPU receiving the packet > > > > > > > > > > > > > > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)- > > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command > > > > > > > > > > ══════════════════════════════════════════════════════════ > > > > > > > > > > ZC X X X > > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z > > > > > > > > > > XDP_DRV/COPY X X X > > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c > > > > > > > > > > SKB_MODE 2M 100% 55% ./xdpsock > > > > > > > > > > -r -i bond0 -q 0 -p -n 1 -S > > > > > > > > > > * CPU receiving the packet > > > > > > > > > > > > > > > > > > > > xdp-bench PPS CPU-SI* Command > > > > > > > > > > ═══════════════════════════════════════════════ > > > > > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch > > > > > > > > > > bond0 -e > > > > > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p > > > > > > > > > > read-data bond0 -e > > > > > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p > > > > > > > > > > parse-ip bond0 -e > > > > > > > > > > * CPU receiving the packet > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Kindly share your thoughts and advice. > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Prashant > > > > > > > > > > > >