On Thu, May 02, 2019 at 10:39:16AM +0200, Magnus Karlsson wrote: > This RFC proposes to add busy-poll support to AF_XDP sockets. With > busy-poll, the driver is executed in process context by calling the > poll() syscall. The main advantage with this is that all processing > occurs on a single core. This eliminates the core-to-core cache > transfers that occur between the application and the softirqd > processing on another core, that occurs without busy-poll. From a > systems point of view, it also provides an advatage that we do not > have to provision extra cores in the system to handle > ksoftirqd/softirq processing, as all processing is done on the single > core that executes the application. The drawback of busy-poll is that > max throughput seen from a single application will be lower (due to > the syscall), but on a per core basis it will often be higher as > the normal mode runs on two cores and busy-poll on a single one. > > The semantics of busy-poll from the application point of view are the > following: > > * The application is required to call poll() to drive rx and tx > processing. There is no guarantee that softirq and interrupts will > do this for you. This is in contrast with the current > implementations of busy-poll that are opportunistic in the sense > that packets might be received/transmitted by busy-poll or > softirqd. (In this patch set, softirq/ksoftirqd will kick in at high > loads just as the current opportunistic implementations, but I would > like to get to a point where this is not the case for busy-poll > enabled XDP sockets, as this slows down performance considerably and > starts to use one more core for the softirq processing. The end goal > is for only poll() to drive the napi loop when busy-poll is enabled > on an AF_XDP socket. More about this later.) > > * It should be enabled on a per socket basis. No global enablement, i.e. > the XDP socket busy-poll will not care about the current > /proc/sys/net/core/busy_poll and busy_read global enablement > mechanisms. > > * The batch size (how many packets that are processed every time the > napi function in the driver is called, i.e. the weight parameter) > should be configurable. Currently, the busy-poll size of AF_INET > sockets is set to 8, but for AF_XDP sockets this is too small as the > amount of processing per packet is much smaller with AF_XDP. This > should be configurable on a per socket basis. > > * If you put multiple AF_XDP busy-poll enabled sockets into a poll() > call the napi contexts of all of them should be executed. This is in > contrast to the AF_INET busy-poll that quits after the fist one that > finds any packets. We need all napi contexts to be executed due to > the first requirement in this list. The behaviour we want is much more > like regular sockets in that all of them are checked in the poll > call. > > * Should be possible to mix AF_XDP busy-poll sockets with any other > sockets including busy-poll AF_INET ones in a single poll() call > without any change to semantics or the behaviour of any of those > socket types. > > * As suggested by Maxim Mikityanskiy, poll() will in the busy-poll > mode return POLLERR if the fill ring is empty or the completion > queue is full. > > Busy-poll support is enabled by calling a new setsockopt called > XDP_BUSY_POLL_BATCH_SIZE that takes batch size as an argument. A value > between 1 and NAPI_WEIGHT (64) will turn it on, 0 will turn it off and > any other value will return an error. > > A typical packet processing rxdrop loop with busy-poll will look something > like this: > > for (i = 0; i < num_socks; i++) { > fds[i].fd = xsk_socket__fd(xsks[i]->xsk); > fds[i].events = POLLIN; > } > > for (;;) { > ret = poll(fds, num_socks, 0); > if (ret <= 0) > continue; > > for (i = 0; i < num_socks; i++) > rx_drop(xsks[i], fds); /* The actual application */ > } > > Need some advice around this issue please: > > In this patch set, softirq/ksoftirqd will kick in at high loads and > render the busy poll support useless as the execution is now happening > in the same way as without busy-poll support. Everything works from an > application perspective but this defeats the purpose of the support > and also consumes an extra core. What I would like to accomplish when Not sure what you mean by 'extra core' . The above poll+rx_drop is executed for every af_xdp socket and there are N cpus processing exactly N af_xdp sockets. Where is 'extra core'? Are you suggesting a model where single core will be busy-polling all af_xdp sockets? and then waking processing threads? or single core will process all sockets? I think the af_xdp model should be flexible and allow easy out-of-the-box experience, but it should be optimized for 'ideal' user that does the-right-thing from max packet-per-second point of view. I thought we've already converged on the model where af_xdp hw rx queues bind one-to-one to af_xdp sockets and user space pins processing threads one-to-one to af_xdp sockets on corresponding cpus... If so that's the model to optimize for on the kernel side while keeping all other user configurations functional. > XDP socket busy-poll is enabled is that softirq/ksoftirq is never > invoked for the traffic that goes to this socket. This way, we would > get better performance on a per core basis and also get the same > behaviour independent of load. I suspect separate rx kthreads of af_xdp socket processing is necessary with and without busy-poll exactly because of 'high load' case you've described. If we do this additional rx-kthread model why differentiate between busy-polling and polling? af_xdp rx queue is completely different form stack rx queue because of target dma address setup. Using stack's napi ksoftirqd threads for processing af_xdp queues creates the fairness issues. Isn't it better to have separate kthreads for them and let scheduler deal with fairness among af_xdp processing and stack? > > To accomplish this, the driver would need to create a napi context > containing the busy-poll enabled XDP socket queues (not shared with > any other traffic) that do not have an interrupt associated with > it. why no interrupt? > > Does this sound like an avenue worth pursuing and if so, should it be > part of this RFC/PATCH or separate? > > Epoll() is not supported at this point in time. Since it was designed > for handling a very large number of sockets, I thought it was better > to leave this for later if the need will arise. > > To do: > > * Move over all drivers to the new xdp_[rt]xq_info functions. If you > think this is the right way to go forward, I will move over > Mellanox, Netronome, etc for the proper patch. > > * Performance measurements > > * Test SW drivers like virtio, veth and tap. Actually, test anything > that is not i40e. > > * Test multiple sockets of different types in the same poll() call > > * Check bisectability of each patch > > * Versioning of the libbpf interface since we add an entry to the > socket configuration struct > > This RFC has been applied against commit 2b5bc3c8ebce ("r8169: remove manual autoneg restart workaround") > > Structure of the patch set: > Patch 1: Makes the busy poll batch size configurable inside the kernel > Patch 2: Adds napi id to struct xdp_rxq_info, adds this to the > xdp_rxq_reg_info function and changes the interface and > implementation so that we only need a single copy of the > xdp_rxq_info struct that resides in _rx. Previously there was > another one in the driver, but now the driver uses the one in > _rx. All XDP enabled drivers are converted to these new > functions. > Patch 3: Adds a struct xdp_txq_info with corresponding functions to > xdp_rxq_info that is used to store information about the tx > queue. The use of these are added to all AF_XDP enabled drivers. > Patch 4: Introduce a new setsockopt/getsockopt in the uapi for > enabling busy_poll. > Patch 5: Implements busy poll in the xsk code. > Patch 6: Add busy-poll support to libbpf. > Patch 7: Add busy-poll support to the xdpsock sample application. > > Thanks: Magnus > > Magnus Karlsson (7): > net: fs: make busy poll budget configurable in napi_busy_loop > net: i40e: ixgbe: tun: veth: virtio-net: centralize xdp_rxq_info and > add napi id > net: i40e: ixgbe: add xdp_txq_info structure > netdevice: introduce busy-poll setsockopt for AF_XDP > net: add busy-poll support for XDP sockets > libbpf: add busy-poll support to XDP sockets > samples/bpf: add busy-poll support to xdpsock sample > > drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 - > drivers/net/ethernet/intel/i40e/i40e_main.c | 8 +- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 37 ++++- > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +- > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 2 +- > drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 48 ++++-- > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 2 +- > drivers/net/tun.c | 14 +- > drivers/net/veth.c | 10 +- > drivers/net/virtio_net.c | 8 +- > fs/eventpoll.c | 5 +- > include/linux/netdevice.h | 1 + > include/net/busy_poll.h | 7 +- > include/net/xdp.h | 23 ++- > include/net/xdp_sock.h | 3 + > include/uapi/linux/if_xdp.h | 1 + > net/core/dev.c | 43 ++---- > net/core/xdp.c | 103 ++++++++++--- > net/xdp/Kconfig | 1 + > net/xdp/xsk.c | 122 ++++++++++++++- > net/xdp/xsk_queue.h | 18 ++- > samples/bpf/xdpsock_user.c | 203 +++++++++++++++---------- > tools/include/uapi/linux/if_xdp.h | 1 + > tools/lib/bpf/xsk.c | 23 +-- > tools/lib/bpf/xsk.h | 1 + > 26 files changed, 495 insertions(+), 195 deletions(-) > > -- > 2.7.4