Re: AF_XDP new prefer busy poll

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Dan, sorry for the late reply. I'm in easter mode. :-)

On Thu, 1 Apr 2021 at 22:09, Dan Siemon <dan@xxxxxxxxxxxxx> wrote:
>
> I've started working on adding SO_PREFER_BUSY_POLL [1] to afxdp-rs [2].
> I have a few questions that I haven't been able to answer definitively
> from docs or commits.
>
> 1) To confirm, configuration like the below is required?
>
> echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs
> echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout
>

Yes, but the defer count and timeout is really up to you. The "prefer
busy-polling" is built on top of commit 6f8b12d661d0 ("net: napi: add
hard irqs deferral feature").

> 2) It's not clear to me what polling operations are required. It looks
> like the xdpdock example was modified to call recvfrom() and sendto()
> in every situation where previously the condition was that the
> need_wakeup flag was set on one of the queues. It looks like this
> structure may do extra syscalls?
>

Yes. More below.

> It it sufficient to 'poll' (I don't mean the syscall here) the socket
> once with one syscall operation or do we need the equivalent of a send
> and recv operation (like the example) in each loop iteration?
>

The idea with busy-polling from a kernel perspective is that the
driver code is entered via the syscall (read or write). For the
receive side: syscall() -> enter the napi poll implementation of the
netdev, and pass the packets (if any) to the XDP socket ring.
With busy-polling enabled there are no interrupts or softirq
mechanisms that execute the driver code. IOW, it's up to userland to
call the driver via a syscall. Busy-polling will typically require
more syscalls than a need_wakeup mode (as you noted above).

Again, when you are executing in busy-polling mode, the userland
application has to do a syscall to run the driver code. Does that mean
that userland can starve the driver? No, and this is where the time
out/defer count comes in. I'll expand on this in 5 (below)

> 3) The patch linked below mentions adding recvmsg and sendmsg support
> for busy polling. The xdpsock example uses recvfrom(). What is the set
> of syscalls that can drive the busy polling? Is there a recommendation
> for which one(s) should be used?
>

recvmsg/sendmsg and poll (which means read/recvfrom/recvmsg, and
corresponding on the write side). Use readfrom for rx queues, and
sendto for tx queues.  Poll works as well, but the overhead for poll
is larger than send/recv.

> 4) In situations where there are multiple sockets, will it work to do
> one poll syscall with multiple fds to reduce the number of syscalls? Is
> that a good idea?
>

The current implementation is really a one socket/syscall thing.
Magnus and I had some ideas on extending busy-polling for a set of
sockets, but haven't had a use-case for it yet.

> 5)
>
> "If the application stops performing busy-polling via a system call,
> the watchdog timer defined by gro_flush_timeout will timeout, and
> regular softirq handling will resume."
>
> Does this imply that if the application fails to poll within the
> watchdog time that it needs to take action to get back into prefer busy
> polling mode?
>

Yes. If the application fails to poll within the specified timeout,
the kernel will do driver polling at a pace of the timeout, and if
there are no packets after "defer count" times, interrupts will be
enabled. This to ensure that the driver is not starved by userland.
Have a look at Eric's commit above for details on the defer/timeout
logic.

> On the plus side, the initial performance numbers look good but there
> are a lot of drops as traffic ramps up that I haven't figured out the
> cause of yet. There are no drops once it's running in a steady state.
>

Interesting! Please let me know your results, and if you run into weirdness!


Also, please let me know if you need more input!

Cheers!
Björn


> Thanks for any help or insight.
>
> [1] - https://lwn.net/Articles/837010/
> [2] - https://github.com/aterlo/afxdp-rs
>




[Index of Archives]     [Linux Networking Development]     [Fedora Linux Users]     [Linux SCTP]     [DCCP]     [Gimp]     [Yosemite Campsites]

  Powered by Linux