Re: AF_XDP new prefer busy poll

Björn Töpel <bjorn.topel@xxxxxxxxx> · Tue, 6 Apr 2021 16:30:23 +0200

On Tue, 6 Apr 2021 at 02:50, Dan Siemon <dan@xxxxxxxxxxxxx> wrote:
>
> On Mon, 2021-04-05 at 10:26 +0200, Björn Töpel wrote:
> >
> > > 3) The patch linked below mentions adding recvmsg and sendmsg
> > > support
> > > for busy polling. The xdpsock example uses recvfrom(). What is the
> > > set
> > > of syscalls that can drive the busy polling? Is there a
> > > recommendation
> > > for which one(s) should be used?
> > >
> >
> > recvmsg/sendmsg and poll (which means read/recvfrom/recvmsg, and
> > corresponding on the write side). Use readfrom for rx queues, and
> > sendto for tx queues.  Poll works as well, but the overhead for poll
> > is larger than send/recv.
>
> To clarify, does this mean:
> * When a descriptor is added to fill ring or tx ring, call sendmsg.
> * When looking for descriptors in completion ring or rx ring, first
> call recvmsg()
>
> ?

Not quite; Tx (completetion/Tx ring) sendmsg, Rx (fill/Rx ring) recvmsg.

>
> Or are the fq and cq different vs. tx and rx?
>
> It might be useful to outline an idealized xsk loop. The loop I have
> looks something like:
>
> for each socket:
> 1) Process completion queue (read from cq)
> 2) Try to receive descriptors (read from rx queue)
> 3) Send any pending packets (write to tx queue)
> 4) Add descriptors to fq [based on a deficit counter condition] (write
> to fq)
>
> [My use case is packet forwarding between sockets]
>
> Ideally there wouldn't a syscall in each of those four steps.
>
> It it acceptable to call recvmsg once at the top of the loop and only
> call sendmsg() if one of steps 3 or 4 wrote to a queue (fq, tx)?
>

Yes, and moreover on the Tx side, you can write multiple packets and
then call one sendmsg() (but then the latency will be worse).

> In my use case, packet forwarding with dedicated cores, if one syscall
> at the top of the loop did 'send' and 'receive' that might be more
> efficient as the next iteration can process the descriptors written in
> the previous iteration.
>
> >
> > > 5)
> > >
> > > "If the application stops performing busy-polling via a system
> > > call,
> > > the watchdog timer defined by gro_flush_timeout will timeout, and
> > > regular softirq handling will resume."
> > >
> > > Does this imply that if the application fails to poll within the
> > > watchdog time that it needs to take action to get back into prefer
> > > busy
> > > polling mode?
> > >
> >
> > Yes. If the application fails to poll within the specified timeout,
> > the kernel will do driver polling at a pace of the timeout, and if
> > there are no packets after "defer count" times, interrupts will be
> > enabled. This to ensure that the driver is not starved by userland.
> > Have a look at Eric's commit above for details on the defer/timeout
> > logic.
>
> I need to dig a bit to understand this more. How does the application
> determine that interrupts have been re-enabled so it can disable them
> again?
>

The application doesn't need to care about that. It really just an
implementation detail; The only thing the application needs to do is
set the timeout/defer count, and make sure to do syscalls. Depending
on the kind of flows, the timeout/defer count can be tweaked for
better latency. The fact that interrupts get reenabled is just to make
sure that the driver isn't starved *if* the application is bad
behaved.

Does that make sense?

Cheers,
Björn

> Thanks for your help.
>