> > > On 02/08/2023 14.33, Wei Fang wrote: > >>> + struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp); > >> XDP_TX can avoid this conversion to xdp_frame. > >> It would requires some refactor of fec_enet_txq_xmit_frame(). > >> > > Yes, but I'm not intend to change it, using the existing interface is enough. > > > >>> + struct fec_enet_private *fep = netdev_priv(ndev); > >>> + struct fec_enet_priv_tx_q *txq; > >>> + int cpu = smp_processor_id(); > >>> + struct netdev_queue *nq; > >>> + int queue, ret; > >>> + > >>> + queue = fec_enet_xdp_get_tx_queue(fep, cpu); > >>> + txq = fep->tx_queue[queue]; > > Notice how TXQ gets selected based on CPU. > Thus it will be the same for all the frames. > Yes, I'll optimize it, thanks! > >>> + nq = netdev_get_tx_queue(fep->netdev, queue); > >>> + > >>> + __netif_tx_lock(nq, cpu); > >> > >> It is sad that XDP_TX takes a lock for each frame. > >> > > Yes, but the XDP path share the queue with the kernel network stack, > > so we need a lock here, unless there is a dedicated queue for XDP > > path. Do you have a better solution? > > > > Yes, the solution would be to keep a stack local (or per-CPU) queue for all the > XDP_TX frames, and send them at the xdp_do_flush_map() call site. This is > basically what happens with xdp_do_redirect() in cpumap.c and devmap.c > code, that have a per-CPU bulk queue and sends a bulk of packets into > fec_enet_xdp_xmit / ndo_xdp_xmit. > > I understand if you don't want to add the complexity to the driver. > And I guess, it should be a followup patch to make sure this actually > improves performance. > Thanks, I got it. I'll optimize in a followup patch if it really improves the performance.