On 02/08/2023 14.33, Wei Fang wrote:
+ struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
XDP_TX can avoid this conversion to xdp_frame.
It would requires some refactor of fec_enet_txq_xmit_frame().
Yes, but I'm not intend to change it, using the existing interface is enough.
+ struct fec_enet_private *fep = netdev_priv(ndev);
+ struct fec_enet_priv_tx_q *txq;
+ int cpu = smp_processor_id();
+ struct netdev_queue *nq;
+ int queue, ret;
+
+ queue = fec_enet_xdp_get_tx_queue(fep, cpu);
+ txq = fep->tx_queue[queue];
Notice how TXQ gets selected based on CPU.
Thus it will be the same for all the frames.
+ nq = netdev_get_tx_queue(fep->netdev, queue);
+
+ __netif_tx_lock(nq, cpu);
It is sad that XDP_TX takes a lock for each frame.
Yes, but the XDP path share the queue with the kernel network stack, so
we need a lock here, unless there is a dedicated queue for XDP path. Do
you have a better solution?
Yes, the solution would be to keep a stack local (or per-CPU) queue for
all the XDP_TX frames, and send them at the xdp_do_flush_map() call
site. This is basically what happens with xdp_do_redirect() in cpumap.c
and devmap.c code, that have a per-CPU bulk queue and sends a bulk of
packets into fec_enet_xdp_xmit / ndo_xdp_xmit.
I understand if you don't want to add the complexity to the driver.
And I guess, it should be a followup patch to make sure this actually
improves performance.
--Jesper