On Tue, 6 Apr 2021 at 02:50, Dan Siemon <dan@xxxxxxxxxxxxx> wrote: > > On Mon, 2021-04-05 at 10:26 +0200, Björn Töpel wrote: > > > > > 3) The patch linked below mentions adding recvmsg and sendmsg > > > support > > > for busy polling. The xdpsock example uses recvfrom(). What is the > > > set > > > of syscalls that can drive the busy polling? Is there a > > > recommendation > > > for which one(s) should be used? > > > > > > > recvmsg/sendmsg and poll (which means read/recvfrom/recvmsg, and > > corresponding on the write side). Use readfrom for rx queues, and > > sendto for tx queues. Poll works as well, but the overhead for poll > > is larger than send/recv. > > To clarify, does this mean: > * When a descriptor is added to fill ring or tx ring, call sendmsg. > * When looking for descriptors in completion ring or rx ring, first > call recvmsg() > > ? Not quite; Tx (completetion/Tx ring) sendmsg, Rx (fill/Rx ring) recvmsg. > > Or are the fq and cq different vs. tx and rx? > > It might be useful to outline an idealized xsk loop. The loop I have > looks something like: > > for each socket: > 1) Process completion queue (read from cq) > 2) Try to receive descriptors (read from rx queue) > 3) Send any pending packets (write to tx queue) > 4) Add descriptors to fq [based on a deficit counter condition] (write > to fq) > > [My use case is packet forwarding between sockets] > > Ideally there wouldn't a syscall in each of those four steps. > > It it acceptable to call recvmsg once at the top of the loop and only > call sendmsg() if one of steps 3 or 4 wrote to a queue (fq, tx)? > Yes, and moreover on the Tx side, you can write multiple packets and then call one sendmsg() (but then the latency will be worse). > In my use case, packet forwarding with dedicated cores, if one syscall > at the top of the loop did 'send' and 'receive' that might be more > efficient as the next iteration can process the descriptors written in > the previous iteration. > > > > > > 5) > > > > > > "If the application stops performing busy-polling via a system > > > call, > > > the watchdog timer defined by gro_flush_timeout will timeout, and > > > regular softirq handling will resume." > > > > > > Does this imply that if the application fails to poll within the > > > watchdog time that it needs to take action to get back into prefer > > > busy > > > polling mode? > > > > > > > Yes. If the application fails to poll within the specified timeout, > > the kernel will do driver polling at a pace of the timeout, and if > > there are no packets after "defer count" times, interrupts will be > > enabled. This to ensure that the driver is not starved by userland. > > Have a look at Eric's commit above for details on the defer/timeout > > logic. > > I need to dig a bit to understand this more. How does the application > determine that interrupts have been re-enabled so it can disable them > again? > The application doesn't need to care about that. It really just an implementation detail; The only thing the application needs to do is set the timeout/defer count, and make sure to do syscalls. Depending on the kind of flows, the timeout/defer count can be tweaked for better latency. The fact that interrupts get reenabled is just to make sure that the driver isn't starved *if* the application is bad behaved. Does that make sense? Cheers, Björn > Thanks for your help. >