Re: AF_XDP sendto kick returning EPERM

Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> · Sun, 9 May 2021 17:41:36 +0200

On Fri, May 07, 2021 at 08:39:04PM +0530, Srivats P wrote:
> Here's an update -
> 
> On Fri, May 7, 2021 at 8:17 PM Srivats P <pstavirs@xxxxxxxxx> wrote:
> >
> > On Mon, May 3, 2021 at 1:54 PM Magnus Karlsson
> > <magnus.karlsson@xxxxxxxxx> wrote:
> > >
> > > On Thu, Apr 29, 2021 at 5:47 PM Srivats P <pstavirs@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 12:58 PM Magnus Karlsson
> > > > <magnus.karlsson@xxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Apr 23, 2021 at 5:44 PM Srivats P <pstavirs@xxxxxxxxx> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm using sendto() to kick tx in my AF_XDP program after I submit
> > > > > > descriptors to the tx ring -
> > > > > >
> > > > > > ret = sendto(xsk_socket__fd(xsk_), NULL, 0, MSG_DONTWAIT, NULL, 0);
> > > > > >
> > > > > > However, I'm receiving EPERM as the return value every time. AFAIK
> > > > > > this is not an expected return value. Since this is with i40e, I
> > > > > > checked i40e_xsk_wakeup() - but that also doesn't return EPERM. I am
> > > > > > running as root and I don't see any problems with creating the xsk,
> > > > > > configuring umem etc.
> > > > > >
> > > > > > Also, no packets seem to go out either.
> > > > > >
> > > > > > # uname -a
> > > > > > Linux Ostinato-1 5.11.15-1-default #1 SMP Fri Apr 16 16:47:34 UTC 2021
> > > > > > (64fb5bf) x86_64 x86_64 x86_64 GNU/Linux
> > > > > >
> > > > > > I don't see the problem on another machine with i40e but older kernel 5.4 series
> > > > > >
> > > > > > Any suggestions on what to look for or how to proceed?
> > > > >
> > > > > Weird. Have not seen this before. What is your command line for
> > > > > xdpsock? Is it unmodified?
> > > >
> > > > This is not xdpsock, but my own AF_XDP program.
> > > >
> > > > >
> > > > > Using bpftrace, we can get the call stack of xsk_sendmsg. Somewhere in
> > > > > this stack there must be an EPERM. You can run the same command on
> > > > > your system, but use ftrace to see what a sendto call hits. Then see
> > > > > where the code terminates.
> > > > >
> > > > > mkarlsso@kurt:~/src/dna-linux$ sudo bpftrace -e 'kprobe:xsk_sendmsg {
> > > > > @[kstack()] = count(); }'
> > > > > Attaching 1 probe...
> > > > > ^C
> > > > >
> > > > > @[
> > > > >     xsk_sendmsg+1
> > > > >     sock_sendmsg+94
> > > > >     __sys_sendto+238
> > > > >     __x64_sys_sendto+37
> > > > >     do_syscall_64+51
> > > > >     entry_SYSCALL_64_after_hwframe+68
> > > > > ]: 2244805
> > > >
> > > > Ostinato-1:~ # bpftrace -e 'kprobe:xsk_sendmsg {
> > > > @[kstack()] = count(); }'
> > > > Attaching 1 probe...^C@[
> > > >     xsk_sendmsg+1
> > > >     sock_sendmsg+94
> > > >     __sys_sendto+238
> > > >     __x64_sys_sendto+37
> > > >     do_syscall_64+51
> > > >     entry_SYSCALL_64_after_hwframe+68
> > > > ]: 1253307
> > > >
> > > > Which doesn't seem to suggest any error - I've looked at the source
> > > > code for all these functions, but don't see any reference to EPERM.
> > >
> > > It must be in there somewhere :-). Could you plesae use ftrace
> > > (through perf for example) and trace all functions that a sendto hits
> > > in your case? Then we might see what it hits.
> > >
> > > Are you running in SKB mode or in zero-copy mode? Guess it is
> > > zero-copy from your mail, but just want to verify. Does Rx work as
> > > expected?
> > >
> > > Could you share your AF_XDP program?

+1, that would help us probably :)

> >
> > After some experimentation and a lot of head-scratching, I found part
> > of the problem last night. The sendto() was not returning EPERM (-1),
> > but ENXIO (-6) - I was mistakenly printing the return value of the
> > sento() call (which always returns -1 in case of failure), instead of
> > errno (duh!).
> >
> > Looking at the code, I see ENXIO is returned if the xsk is unbound.
> > I'm still investigating this and will post an update soon. The problem
> > is happening at a customer end and there's some delay and follow up
> > required to get the logs.
> 
> sendto() was returning ENXIO because the interface MTU was set to 9000
> which I know is not supported with AF_XDP. But shouldn't
> xsk_socket__create() fail in this case? Note the actual packet being
> transmitted was 64 bytes.

It depends. You said that you have your own AF_XDP app, so if you're
setting the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD flag then libbpf wouldn't
be loading the built-in AF_XDP eBPF prog on interface and that's where the
failure should happen.

> 
> Not sure if it has a role in the above sendto() failure, but before
> xsk socket create, my call to bpf_set_link_xdp_fd() was failing
> because of the MTU problem (the newly added error message for this
> case was very helpful!). Once MTU was reduced to 1500 both the RX eBPF
> program link to the interface failure and the TX sendto() returning
> ENXIO always went away. Kernel version 5.12
> 
> Can someone tell me what is expected to happen for a Tx AF_XDP socket
> in case of MTU > 4K?

See the last paragraph.

> 
> I also found a second case of sendto() returning ENXIO. In this
> scenario, I was removing my RX eBPF program by calling
> 
>     bpf_set_link_xdp_fd(ifIndex, -1, 0)
> 
> while AF_XDP transmit (and associated sento() wakeup) was still going
> on. In this case, sendto starts failing with ENETDOWN for some time
> followed by ENXIO subsequently. This case was on Kernel version 5.4.0

I think that we addressed the ENETDOWN Tx issue with the following set:
https://lore.kernel.org/netdev/20200205045834.56795-1-maciej.fijalkowski@xxxxxxxxx/

I see that it has been merged in 5.6. But it was related to being unable
to spawn multiple AF_XDP Tx-only instances. With what you're saying it
feels to me that you have multiple instances of your AF_XDP progs and you
terminate one of them? Previously, every instance would die due to the
fact that the underlying XDP prog would be unloaded from interface, but
right now we have bpf_link support for AF_XDP which would handle that
properly. Note that it was developed for the built-in prog.

> 
> Does removing a XDP program cause the interface to go down (ENETDOWN)
> leading to XDP socket unbind (ENXIO)? Should removing (or replacing)
> an RX eBPF program, affect AF_XDP TX?

Removing XDP prog causes the interface to undergo the reset or some other
mechanism as it needs to remove the XDP Tx resources and change the Rx
memory model. For Intel drivers, the AF_XDP Tx resources are configured
during the load of Rx eBPF prog. We would have to develop some mechanism
that detaches the creation of XDP Tx resources from loading Rx eBPF prog.
There have been discussions around feature detection but I think it was
about the opposite - don't configure Tx rings if your prog will not be
doing XDP_TX action.

> 
> Srivats