>On 14/12/2021 12.25, Maciej Fijalkowski wrote: >> On Tue, Dec 14, 2021 at 10:40:05AM +0000, Karlsson, Magnus wrote: >>> Adding Ederson and Maciej. >>> >>>> On 14/12/2021 09.07, Karlsson, Magnus wrote: >>>>> >>>>> >>>>>> -----Original Message----- From: Jesper Dangaard Brouer >>>>>> <jbrouer@xxxxxxxxxx> Sent: Monday, December 13, 2021 10:04 PM To: >>>>>> Karlsson, Magnus <magnus.karlsson@xxxxxxxxx>; Björn Töpel >>>>>> <bjorn@xxxxxxxxxx> Cc: Brouer, Jesper <brouer@xxxxxxxxxx>; Xdp >>>>>> <xdp- newbies@xxxxxxxxxxxxxxx>; Ong, Boon Leong >>>>>> <boon.leong.ong@xxxxxxxxx>; Joao Pedro Barros Silva >>>>>> <jopbs@xxxxxxxxxx>; Diogo Alexandre Da Silva Lima <dioli@xxxxxxxxxx> >>>>>> Subject: AF_XDP not transmitting frames immediately >>>>>> >>>>>> Hi Magnus and Bjørn, >>>>>> >>>>>> I'm coding on an AF_XDP program[1] that need to send (a bulk of >>>>>> packets) in a short time-window (related to Time-Triggered Ethernet). >>>>>> >>>>>> My observations are that AF_XDP doesn't send the frames immediately. >>>>>> And yes, I do call sendto() to trigger a TX kick. In zero-copy mode >>>>>> this is particular bad. My program want to send 4 packets in a >>>>>> burst, but I'm observing 8 packets grouped together on the receiving >>>>>> host. >>>>>> >>>>>> Is the a known property of AF_XDP? >>>>> >>>>> Nope! It is supposed to be able to send one packet at a time, though I >>>>> have several times seen bugs in the drivers where the batching >>>>> behavior shines through like this, and once a bug in the core code. >>>>> There is even a test these days for just sending a single packet, >>>> >>>> Where is that test in the kernel tree? >>> >>> In tools/testing/selftests/bpf/xdpxceiver.c. It is the >RUN_TO_COMPLETION_SINGLE_PKT test. But the framework only operates on >veth currently. >> >> I'd say it's driver's fault. Magnus fixed something similar for i40e: >> https://lore.kernel.org/netdev/20210401172107.1191618-3- >anthony.l.nguyen@xxxxxxxxx/ > >Thanks for that hint. > >> >> We don't have currently igc HW on our side to dig this :< > >I suspected Boon Leong (cc) would have this hardware. Unfortunately, my current setup in lab does not have I225 hooked-up and I am working remotely due to control access to intel facility. Perhaps, Ederson may have ready system to test? For ZC mode, the igc driver (also true to stmmac) depends on the XSK wakeup to trigger the NAPI poll (igc_poll) to first clean-up Tx ring and eventually call igc_xdp_xmit_zc() to start submitting Tx frame into DMA engine. We have used busy-poll to ensure in smaller Tx frame latency/jitter. There was another issue in stmmac that was patched [1] recently to ensure the driver does not perform MAC reset whenever XDP program is added so that between XDP socket creation, the Tx transmit does not take extra 2-3s due to link down/up. Jesper, are you seeing something similar in your app? If yes, then it is likely because of the implementation of igc driver in mainline that is doing igc_down(), a little bit too aggressive in reseting MAC completely. [1] https://patchwork.kernel.org/project/netdevbpf/patch/20211111143949.2806049-1-boon.leong.ong@xxxxxxxxx/ > >>> >>>>> since we have had issues with this in the past. That test does pass in >>>>> bpf-next, but it is only run with the veth driver that does not >>>>> support zero-copy so could still be an issue. What driver are you >>>>> using in zero-copy mode and what kernel version are you on? >>>> >>>> Driver: igc with Intel chip i225 >>> >>> Have never tried this one personally. Do not know if I have one in the lab >but let me check. >>> >>> Ederson, do you have any experience with this card and if so, have you seen >something similar? >>> >>>> Kernel version: 5.15.0-net-next+ #618 SMP PREEMPT >>>> - Devel branch at commit 6d3b1b069946 (v5.15-12802-g6d3b1b069946) >>>> >>>>>> How can I get AF_XDP to "flush" TX packets when calling sendto()? >>>>>> Should we add another flag than the current MSG_DONTWAIT? >>>>> >>>>> In zero-copy mode with softirq driver processing (not busy poll), a >>>>> sendto will just trigger the xsk_wakeup ndo that schedules napi unless >>>>> it is already executing. It is up to the driver to then get packets >>>>> from the Tx ring and put them on the HW and make sure they are sent. >>>>> Barring any HW quirks, sending one packets should be perfectly fine. >>>> >>>> I will investigate driver level issues. >>>> >>>> I have other (100G) NICs in my testlab, but I'm using these 1G NICs because >>>> they support hardware timestamping, which allows me to investigate >these >>>> timing issues. >>>> I'll find a way to see of other drivers behave differently. >>> >>> Would be great if you could check if the problem also exists on e.g. ice. >>> > >Having issues getting my ICE hardware to link up. > >I tested that driver i40e works as expected. Thus, this is likely an >issue with the driver. I will digg some more. > > >>>>>> Hint, I'm using tcpdump hardware timestamping on receiving hist via >>>>>> cmdline: >>>>>> >>>>>> tcpdump -vv -s0 -ni eth1 -j adapter_unsynced >>>>>> --time-stamp-precision=nano -w af_xdp_tx_cyclic.dump42 >>>>>> >>>>>> Notice[1] on specific branch: > >[1] >https://github.com/xdp-project/bpf- >examples/tree/vestas03_AF_XDP_example/AF_XDP-interaction > >In [1] I tried to play with SO_PREFER_BUSY_POLL, but it didn't make a >difference. > >[2] https://github.com/xdp-project/bpf-examples/commit/3685d5ea93fced > >--Jesper