On Thu, Sep 19, 2024 at 04:41:04PM +0800, Wei Fang wrote: > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC > on LS1028A, it was found that if the command was re-run multiple times, > Rx could not receive the frames, and the result of xdo-bench showed > that the rx rate was 0. > > root@ls1028ardb:~# ./xdp-bench tx eno0 > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) > Summary 2046 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > > By observing the Rx PIR and CIR registers, we found that CIR is always > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring > is full and can no longer accommodate other Rx frames. Therefore, it > is obvious that the RX BD ring has not been cleaned up. > > Further analysis of the code revealed that the Rx BD ring will only > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. > Therefore, some debug logs were added to the driver and the current > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx > BD ring was full. The logs are as follows. > > [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 > [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 > [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 > > From the results, we can see that the maximum value of xdp_tx_in_flight > has reached 2140. However, the size of the Rx BD ring is only 2048. This > is incredible, so checked the code again and found that the driver did > not reset xdp_tx_in_flight when installing or uninstalling bpf program, > resulting in xdp_tx_in_flight still retaining the value after the last > command was run. > > Fixes: c33bfaf91c4c ("net: enetc: set up XDP program under enetc_reconfigure()") This does not explain why enetc_recycle_xdp_tx_buff(), which decreases xdp_tx_in_flight, does not get called? In patch 2/3 you wrote: | Tx BD rings are disabled first in enetc_stop() and then | wait for empty. This operation is not safe while the Tx BD ring | is actively transmitting frames, and will cause the ring to not | be empty and hardware exception. As described in the block guide | of LS1028A NETC, software should only disable an active ring after | all pending ring entries have been consumed (i.e. when PI = CI). | Disabling a transmit ring that is actively processing BDs risks | a HW-SW race hazard whereby a hardware resource becomes assigned | to work on one or more ring entries only to have those entries be | removed due to the ring becoming disabled. So the correct behavior | is that the software stops putting frames on the Tx BD rings (this | is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be | empty, and finally disables the Tx BD rings. I'm surprised that after fixing that, this change would still be needed, rather than xdp_tx_in_flight naturally dropping down to 0 when stopping NAPI. Why doesn't that happen, and what happens to the pending XDP_TX buffers?