Anyone from Intel? Maciej, Björn, Magnus? Does anyone else have anything to say? IMO, calling XDP for the same packet multiple times is a bug, we should agree on some sane solution. On Thu, 2022-08-18 at 14:26 +0300, Maxim Mikityanskiy wrote: > Hi Maciej, > > On Wed, 2022-04-13 at 17:30 +0200, Maciej Fijalkowski wrote: > > v2: > > - add likely for internal redirect return codes in ice and ixgbe > > (Jesper) > > - do not drop the buffer that head pointed to at full XSK RQ > > (Maxim) > > I found an issue with this approach. If you don't drop that packet, > you'll need to run the XDP program for the same packet again. If the > XDP program is anything more complex than "redirect-everything-to- > XSK", > it will get confused, for example, if it tracks any state or counts > anything. > > We can't check whether there is enough space in the XSK RX ring > before > running the XDP program, as we don't know in advance which XSK socket > will be selected. > > We can't store bpf_redirect_info across NAPI calls to avoid running > the > XDP program second time, as bpf_redirect_info is protected by RCU and > the assumption that the whole XDP_REDIRECT operation happens within > one > NAPI cycle. > > I see the following options: > > 1. Drop the packet when an overflow happens. The problem is that it > will happen systematically, but it's still better than the old > behavior > (drop mulitple packets when an overflow happens and hog the CPU). > > 2. Make this feature opt-in. If the application opts in, it > guarantees > to take measures to handle duplicate packets in XDP properly. > > 3. Require the XDP program to indicate it supports being called > multiple times for the same packet and pass a flag to it if it's a > repeated run. Drop the packet in the driver if the XDP program > doesn't > indicate this support. The indication can be done similar to how it's > implemented for XDP multi buffer. > > Thoughts? Other options? > > Thanks, > Max > > > - terminate Rx loop only when need_wakeup feature is enabled > > (Maxim) > > - reword from 'stop softirq processing' to 'stop NAPI Rx > > processing' > > - s/ENXIO/EINVAL in mlx5 and stmmac's ndo_xsk_wakeup to keep it > > consitent with Intel's drivers (Maxim) > > - include Jesper's Acks > > > > Hi! > > > > This is a revival of Bjorn's idea [0] to break NAPI loop when XSK > > Rx > > queue gets full which in turn makes it impossible to keep on > > successfully producing descriptors to XSK Rx ring. By breaking out > > of > > the driver side immediately we will give the user space opportunity > > for > > consuming descriptors from XSK Rx ring and therefore provide room > > in the > > ring so that HW Rx -> XSK Rx redirection can be done. > > > > Maxim asked and Jesper agreed on simplifying Bjorn's original API > > used > > for detecting the event of interest, so let's just simply check for > > -ENOBUFS within Intel's ZC drivers after an attempt to redirect a > > buffer > > to XSK Rx. No real need for redirect API extension. > > > > One might ask why it is still relevant even after having proper > > busy > > poll support in place - here is the justification. > > > > For xdpsock that was: > > - run for l2fwd scenario, > > - app/driver processing took place on the same core in busy poll > > with 2048 budget, > > - HW ring sizes Tx 256, Rx 2048, > > > > this work improved throughput by 78% and reduced Rx queue full > > statistic > > bump by 99%. > > > > For testing ice, make sure that you have [1] present on your side. > > > > This set, besides the work described above, carries also > > improvements > > around return codes in various XSK paths and lastly a minor > > optimization > > for xskq_cons_has_entries(), a helper that might be used when XSK > > Rx > > batching would make it to the kernel. > > > > Link to v1 and discussion around it is at [2]. > > > > Thanks! > > MF > > > > [0]: > > https://lore.kernel.org/bpf/20200904135332.60259-1-bjorn.topel@xxxxxxxxx/ > > [1]: > > https://lore.kernel.org/netdev/20220317175727.340251-1-maciej.fijalkowski@xxxxxxxxx/ > > [2]: > > https://lore.kernel.org/bpf/5864171b-1e08-1b51-026e-5f09b8945076@xxxxxxxxxx/T/ > > > > Björn Töpel (1): > > xsk: improve xdp_do_redirect() error codes > > > > Maciej Fijalkowski (13): > > xsk: diversify return codes in xsk_rcv_check() > > ice: xsk: decorate ICE_XDP_REDIR with likely() > > ixgbe: xsk: decorate IXGBE_XDP_REDIR with likely() > > ice: xsk: terminate Rx side of NAPI when XSK Rx queue gets full > > i40e: xsk: terminate Rx side of NAPI when XSK Rx queue gets full > > ixgbe: xsk: terminate Rx side of NAPI when XSK Rx queue gets full > > ice: xsk: diversify return values from xsk_wakeup call paths > > i40e: xsk: diversify return values from xsk_wakeup call paths > > ixgbe: xsk: diversify return values from xsk_wakeup call paths > > mlx5: xsk: diversify return values from xsk_wakeup call paths > > stmmac: xsk: diversify return values from xsk_wakeup call paths > > ice: xsk: avoid refilling single Rx descriptors > > xsk: drop ternary operator from xskq_cons_has_entries > > > > .../ethernet/intel/i40e/i40e_txrx_common.h | 1 + > > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 38 ++++++++----- > > drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + > > drivers/net/ethernet/intel/ice/ice_xsk.c | 53 ++++++++++++--- > > ---- > > .../ethernet/intel/ixgbe/ixgbe_txrx_common.h | 1 + > > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 52 ++++++++++----- > > --- > > .../ethernet/mellanox/mlx5/core/en/xsk/tx.c | 2 +- > > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +- > > net/xdp/xsk.c | 4 +- > > net/xdp/xsk_queue.h | 4 +- > > 10 files changed, 99 insertions(+), 61 deletions(-) > > >