On Fri, Jan 21, 2022 at 02:17:42PM +0100, Alexander Lobakin wrote: > From: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> > Date: Fri, 21 Jan 2022 13:54:47 +0100 > > > From: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > > Date: Fri, 21 Jan 2022 13:00:10 +0100 > > > > > Apply the logic that was done for regular XDP from commit 9610bd988df9 > > > ("ice: optimize XDP_TX workloads") to the ZC side of the driver. On top > > > of that, introduce batching to Tx that is inspired by i40e's > > > implementation with adjustments to the cleaning logic - take into the > > > account NAPI budget in ice_clean_xdp_irq_zc(). > > > > > > Separating the stats structs onto separate cache lines seemed to improve > > > the performance. > > > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > > > --- > > > drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- > > > drivers/net/ethernet/intel/ice/ice_txrx.h | 2 +- > > > drivers/net/ethernet/intel/ice/ice_xsk.c | 256 ++++++++++++++-------- > > > drivers/net/ethernet/intel/ice/ice_xsk.h | 27 ++- > > > 4 files changed, 186 insertions(+), 101 deletions(-) > > > > > > +/** > > > + * ice_fill_tx_hw_ring - produce the number of Tx descriptors onto ring > > > + * @xdp_ring: XDP ring to produce the HW Tx descriptors on > > > + * @descs: AF_XDP descriptors to pull the DMA addresses and lengths from > > > + * @nb_pkts: count of packets to be send > > > + * @total_bytes: bytes accumulator that will be used for stats update > > > + */ > > > +static void ice_fill_tx_hw_ring(struct ice_tx_ring *xdp_ring, struct xdp_desc *descs, > > > + u32 nb_pkts, unsigned int *total_bytes) > > > +{ > > > + u16 tx_thresh = xdp_ring->tx_thresh; > > > + struct ice_tx_desc *tx_desc; > > > > And @tx_desc as well. > > > > > + u32 batched, leftover, i; > > > + > > > + batched = nb_pkts & ~(PKTS_PER_BATCH - 1); > > > + leftover = nb_pkts & (PKTS_PER_BATCH - 1); > > > + for (i = 0; i < batched; i += PKTS_PER_BATCH) > > > + ice_xmit_pkt_batch(xdp_ring, &descs[i], total_bytes); > > > + for (i = batched; i < batched + leftover; i++) > > Breh, I overlooked that. @i will equal @batched after exiting the > first loop, so the assignment here is redundant (probably harmless > tho if the compilers are smart enough). I can drop this and scope variables properly, thanks! > > > > + ice_xmit_pkt(xdp_ring, &descs[i], total_bytes); > > > + > > > + if (xdp_ring->next_to_use > xdp_ring->next_rs) { > > > + tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs); > > > + tx_desc->cmd_type_offset_bsz |= > > > + cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S); > > > + xdp_ring->next_rs += tx_thresh; > > > + } > > > +} > > > > > > - prefetch(tx_desc); > > > +/**