From: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> Date: Thu, 16 Dec 2021 14:59:57 +0100 > Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX > workloads") that has been done in order to address the massive tx_busy > statistic bump and improve the performance as well. > > Increase the ICE_TX_THRESH to 64 as it seems to work out better for both > XDP and AF_XDP. Also, separating the stats structs onto separate cache > lines seemed to improve the performance. Batching approach is inspired > by i40e's implementation with adjustments to the cleaning logic. > > One difference from 'xdpdrv' XDP_TX is when ring has less than > ICE_TX_THRESH free entries, the cleaning routine will not stop after > cleaning a single ICE_TX_THRESH amount of descs but rather will forward > the next_dd pointer and check the DD bit and for this bit being set the > cleaning will be repeated. IOW clean until there are descs that can be > cleaned. > > It takes three separate xdpsock instances in txonly mode to achieve the > line rate and this was not previously possible. > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > --- > drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- > drivers/net/ethernet/intel/ice/ice_txrx.h | 4 +- > drivers/net/ethernet/intel/ice/ice_xsk.c | 249 ++++++++++++++-------- > drivers/net/ethernet/intel/ice/ice_xsk.h | 26 ++- > 4 files changed, 182 insertions(+), 99 deletions(-) > -- 8< -- > diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h > index 4c7bd8e9dfc4..f2eb99063c1f 100644 > --- a/drivers/net/ethernet/intel/ice/ice_xsk.h > +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h > @@ -6,19 +6,36 @@ > #include "ice_txrx.h" > #include "ice.h" > > +#define PKTS_PER_BATCH 8 > + > +#ifdef __clang__ > +#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for > +#elif __GNUC__ >= 4 > +#define loop_unrolled_for _Pragma("GCC unroll 8") for > +#else > +#define loop_unrolled_for for > +#endif It's used in a bunch more places across the tree, what about defining that in linux/compiler{,_clang,_gcc}.h? Is it possible to pass '8' as an argument? Like loop_unrolled_for(PKTS_PER_BATCH) ( ; ; ) { } Could be quite handy. If it is not, I'd maybe try to define a couple of precoded macros for 8, 16 and 32, like #define loop_unrolled_for_8 ... #define loop_unrolled_for_16 ... ... So they could be used as generic. I don't think I've seen them with values other than 8-32. > + > struct ice_vsi; > > #ifdef CONFIG_XDP_SOCKETS > int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, > u16 qid); > int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget); > -bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget); > int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags); > bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count); > bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi); > void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring); > void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring); > +bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget); > #else > +static inline bool > +ice_xmit_zc(struct ice_tx_ring __always_unused *xdp_ring, > + u32 __always_unused budget) > +{ > + return false; > +} > + > static inline int > ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi, > struct xsk_buff_pool __always_unused *pool, > @@ -34,13 +51,6 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring, > return 0; > } > > -static inline bool > -ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring, > - int __always_unused budget) > -{ > - return false; > -} > - > static inline bool > ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring, > u16 __always_unused count) > -- > 2.33.1 Thanks, Al