Hey Dan, sorry for the late reply. I'm in easter mode. :-) On Thu, 1 Apr 2021 at 22:09, Dan Siemon <dan@xxxxxxxxxxxxx> wrote: > > I've started working on adding SO_PREFER_BUSY_POLL [1] to afxdp-rs [2]. > I have a few questions that I haven't been able to answer definitively > from docs or commits. > > 1) To confirm, configuration like the below is required? > > echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs > echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout > Yes, but the defer count and timeout is really up to you. The "prefer busy-polling" is built on top of commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"). > 2) It's not clear to me what polling operations are required. It looks > like the xdpdock example was modified to call recvfrom() and sendto() > in every situation where previously the condition was that the > need_wakeup flag was set on one of the queues. It looks like this > structure may do extra syscalls? > Yes. More below. > It it sufficient to 'poll' (I don't mean the syscall here) the socket > once with one syscall operation or do we need the equivalent of a send > and recv operation (like the example) in each loop iteration? > The idea with busy-polling from a kernel perspective is that the driver code is entered via the syscall (read or write). For the receive side: syscall() -> enter the napi poll implementation of the netdev, and pass the packets (if any) to the XDP socket ring. With busy-polling enabled there are no interrupts or softirq mechanisms that execute the driver code. IOW, it's up to userland to call the driver via a syscall. Busy-polling will typically require more syscalls than a need_wakeup mode (as you noted above). Again, when you are executing in busy-polling mode, the userland application has to do a syscall to run the driver code. Does that mean that userland can starve the driver? No, and this is where the time out/defer count comes in. I'll expand on this in 5 (below) > 3) The patch linked below mentions adding recvmsg and sendmsg support > for busy polling. The xdpsock example uses recvfrom(). What is the set > of syscalls that can drive the busy polling? Is there a recommendation > for which one(s) should be used? > recvmsg/sendmsg and poll (which means read/recvfrom/recvmsg, and corresponding on the write side). Use readfrom for rx queues, and sendto for tx queues. Poll works as well, but the overhead for poll is larger than send/recv. > 4) In situations where there are multiple sockets, will it work to do > one poll syscall with multiple fds to reduce the number of syscalls? Is > that a good idea? > The current implementation is really a one socket/syscall thing. Magnus and I had some ideas on extending busy-polling for a set of sockets, but haven't had a use-case for it yet. > 5) > > "If the application stops performing busy-polling via a system call, > the watchdog timer defined by gro_flush_timeout will timeout, and > regular softirq handling will resume." > > Does this imply that if the application fails to poll within the > watchdog time that it needs to take action to get back into prefer busy > polling mode? > Yes. If the application fails to poll within the specified timeout, the kernel will do driver polling at a pace of the timeout, and if there are no packets after "defer count" times, interrupts will be enabled. This to ensure that the driver is not starved by userland. Have a look at Eric's commit above for details on the defer/timeout logic. > On the plus side, the initial performance numbers look good but there > are a lot of drops as traffic ramps up that I haven't figured out the > cause of yet. There are no drops once it's running in a steady state. > Interesting! Please let me know your results, and if you run into weirdness! Also, please let me know if you need more input! Cheers! Björn > Thanks for any help or insight. > > [1] - https://lwn.net/Articles/837010/ > [2] - https://github.com/aterlo/afxdp-rs >