On Sun, May 9, 2021 at 9:24 PM Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> wrote: > > On Fri, May 07, 2021 at 08:39:04PM +0530, Srivats P wrote: > > Here's an update - > > > > On Fri, May 7, 2021 at 8:17 PM Srivats P <pstavirs@xxxxxxxxx> wrote: > > > > > > On Mon, May 3, 2021 at 1:54 PM Magnus Karlsson > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > On Thu, Apr 29, 2021 at 5:47 PM Srivats P <pstavirs@xxxxxxxxx> wrote: > > > > > > > > > > On Tue, Apr 27, 2021 at 12:58 PM Magnus Karlsson > > > > > <magnus.karlsson@xxxxxxxxx> wrote: > > > > > > > > > > > > On Fri, Apr 23, 2021 at 5:44 PM Srivats P <pstavirs@xxxxxxxxx> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I'm using sendto() to kick tx in my AF_XDP program after I submit > > > > > > > descriptors to the tx ring - > > > > > > > > > > > > > > ret = sendto(xsk_socket__fd(xsk_), NULL, 0, MSG_DONTWAIT, NULL, 0); > > > > > > > > > > > > > > However, I'm receiving EPERM as the return value every time. AFAIK > > > > > > > this is not an expected return value. Since this is with i40e, I > > > > > > > checked i40e_xsk_wakeup() - but that also doesn't return EPERM. I am > > > > > > > running as root and I don't see any problems with creating the xsk, > > > > > > > configuring umem etc. > > > > > > > > > > > > > > Also, no packets seem to go out either. > > > > > > > > > > > > > > # uname -a > > > > > > > Linux Ostinato-1 5.11.15-1-default #1 SMP Fri Apr 16 16:47:34 UTC 2021 > > > > > > > (64fb5bf) x86_64 x86_64 x86_64 GNU/Linux > > > > > > > > > > > > > > I don't see the problem on another machine with i40e but older kernel 5.4 series > > > > > > > > > > > > > > Any suggestions on what to look for or how to proceed? > > > > > > > > > > > > Weird. Have not seen this before. What is your command line for > > > > > > xdpsock? Is it unmodified? > > > > > > > > > > This is not xdpsock, but my own AF_XDP program. > > > > > > > > > > > > > > > > > Using bpftrace, we can get the call stack of xsk_sendmsg. Somewhere in > > > > > > this stack there must be an EPERM. You can run the same command on > > > > > > your system, but use ftrace to see what a sendto call hits. Then see > > > > > > where the code terminates. > > > > > > > > > > > > mkarlsso@kurt:~/src/dna-linux$ sudo bpftrace -e 'kprobe:xsk_sendmsg { > > > > > > @[kstack()] = count(); }' > > > > > > Attaching 1 probe... > > > > > > ^C > > > > > > > > > > > > @[ > > > > > > xsk_sendmsg+1 > > > > > > sock_sendmsg+94 > > > > > > __sys_sendto+238 > > > > > > __x64_sys_sendto+37 > > > > > > do_syscall_64+51 > > > > > > entry_SYSCALL_64_after_hwframe+68 > > > > > > ]: 2244805 > > > > > > > > > > Ostinato-1:~ # bpftrace -e 'kprobe:xsk_sendmsg { > > > > > @[kstack()] = count(); }' > > > > > Attaching 1 probe...^C@[ > > > > > xsk_sendmsg+1 > > > > > sock_sendmsg+94 > > > > > __sys_sendto+238 > > > > > __x64_sys_sendto+37 > > > > > do_syscall_64+51 > > > > > entry_SYSCALL_64_after_hwframe+68 > > > > > ]: 1253307 > > > > > > > > > > Which doesn't seem to suggest any error - I've looked at the source > > > > > code for all these functions, but don't see any reference to EPERM. > > > > > > > > It must be in there somewhere :-). Could you plesae use ftrace > > > > (through perf for example) and trace all functions that a sendto hits > > > > in your case? Then we might see what it hits. > > > > > > > > Are you running in SKB mode or in zero-copy mode? Guess it is > > > > zero-copy from your mail, but just want to verify. Does Rx work as > > > > expected? > > > > > > > > Could you share your AF_XDP program? > > +1, that would help us probably :) The code is proprietary, but if required I can extract relevant bits into a sample program or modify the sample xdpsock_user.c suitably. > > > > > > > After some experimentation and a lot of head-scratching, I found part > > > of the problem last night. The sendto() was not returning EPERM (-1), > > > but ENXIO (-6) - I was mistakenly printing the return value of the > > > sento() call (which always returns -1 in case of failure), instead of > > > errno (duh!). > > > > > > Looking at the code, I see ENXIO is returned if the xsk is unbound. > > > I'm still investigating this and will post an update soon. The problem > > > is happening at a customer end and there's some delay and follow up > > > required to get the logs. > > > > sendto() was returning ENXIO because the interface MTU was set to 9000 > > which I know is not supported with AF_XDP. But shouldn't > > xsk_socket__create() fail in this case? Note the actual packet being > > transmitted was 64 bytes. > > It depends. You said that you have your own AF_XDP app, so if you're > setting the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag then libbpf wouldn't > be loading the built-in AF_XDP eBPF prog on interface and that's where the > failure should happen. I used AF_XDP for TX only with my own eBPF program for RX. For this reason, I was using INHIBIT_PROG_LOAD while opening the xsk. That's why I didn't see an error while creating the xsk. > > > > > Not sure if it has a role in the above sendto() failure, but before > > xsk socket create, my call to bpf_set_link_xdp_fd() was failing > > because of the MTU problem (the newly added error message for this > > case was very helpful!). Once MTU was reduced to 1500 both the RX eBPF > > program link to the interface failure and the TX sendto() returning > > ENXIO always went away. Kernel version 5.12 > > > > Can someone tell me what is expected to happen for a Tx AF_XDP socket > > in case of MTU > 4K? > > See the last paragraph. > > > > > I also found a second case of sendto() returning ENXIO. In this > > scenario, I was removing my RX eBPF program by calling > > > > bpf_set_link_xdp_fd(ifIndex, -1, 0) > > > > while AF_XDP transmit (and associated sento() wakeup) was still going > > on. In this case, sendto starts failing with ENETDOWN for some time > > followed by ENXIO subsequently. This case was on Kernel version 5.4.0 > > I think that we addressed the ENETDOWN Tx issue with the following set: > https://lore.kernel.org/netdev/20200205045834.56795-1-maciej.fijalkowski@xxxxxxxxx/ > > I see that it has been merged in 5.6. But it was related to being unable > to spawn multiple AF_XDP Tx-only instances. With what you're saying it > feels to me that you have multiple instances of your AF_XDP progs and you > terminate one of them? Previously, every instance would die due to the > fact that the underlying XDP prog would be unloaded from interface, but > right now we have bpf_link support for AF_XDP which would handle that > properly. Note that it was developed for the built-in prog. I think my case is different. I have only one AF_XDP Tx-only instance, but I'm not using the built-in AF_XDP eBPF program. So when I remove my eBPF program the AF_XDP Tx also gets affected. I solved my problem by cleaning up the AF_XDP Tx first before removing my custom eBPF Rx program. > > > > > Does removing a XDP program cause the interface to go down (ENETDOWN) > > leading to XDP socket unbind (ENXIO)? Should removing (or replacing) > > an RX eBPF program, affect AF_XDP TX? > > Removing XDP prog causes the interface to undergo the reset or some other > mechanism as it needs to remove the XDP Tx resources and change the Rx > memory model. For Intel drivers, the AF_XDP Tx resources are configured > during the load of Rx eBPF prog. We would have to develop some mechanism > that detaches the creation of XDP Tx resources from loading Rx eBPF prog. > There have been discussions around feature detection but I think it was > about the opposite - don't configure Tx rings if your prog will not be > doing XDP_TX action. I guess I was sort of implicitly assuming that XDP Tx and Rx paths are independent. Which is not the case. This is good to keep in mind while coding. I think it might be a worthwhile goal to allow the eBPF program to be removed/replaced without affecting Tx - not sure how feasible that is though. Thanks for all the help! > > > > > Srivats