> -----Original Message----- > From: Vladimir Oltean <vladimir.oltean@xxxxxxx> > Sent: 2024年10月1日 6:03 > To: Wei Fang <wei.fang@xxxxxxx> > Cc: davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx; > pabeni@xxxxxxxxxx; Claudiu Manoil <claudiu.manoil@xxxxxxx>; > ast@xxxxxxxxxx; daniel@xxxxxxxxxxxxx; hawk@xxxxxxxxxx; > john.fastabend@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; bpf@xxxxxxxxxxxxxxx; stable@xxxxxxxxxxxxxxx; > imx@xxxxxxxxxxxxxxx; rkannoth@xxxxxxxxxxx; maciej.fijalkowski@xxxxxxxxx; > sbhatta@xxxxxxxxxxx > Subject: Re: [PATCH v2 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings > are disabled > > On Sun, Sep 29, 2024 at 10:45:06AM +0800, Wei Fang wrote: > > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC > > on LS1028A, it was found that if the command was re-run multiple times, > > Rx could not receive the frames, and the result of xdo-bench showed > > that the rx rate was 0. > > > > root@ls1028ardb:~# ./xdp-bench tx eno0 > > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc) > > Summary 2046 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > Summary 0 rx/s 0 > err,drop/s > > > > By observing the Rx PIR and CIR registers, we found that CIR is always > > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring > > is full and can no longer accommodate other Rx frames. Therefore, we > > can conclude that the problem is caused by the Rx BD ring not being > > cleaned up. > > > > Further analysis of the code revealed that the Rx BD ring will only > > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met. > > Therefore, some debug logs were added to the driver and the current > > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx > > BD ring was full. The logs are as follows. > > > > [ 178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140 > > [ 178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110 > > [ 178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110 > > > > From the results, we can see that the max value of xdp_tx_in_flight > > has reached 2140. However, the size of the Rx BD ring is only 2048. > > This is incredible, so we checked the code again and found that > > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled > > and it was not reset when the bfp program was installed again. The > > root cause is that the IRQ is disabled too early in enetc_stop(), > > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore, > > xdp_tx_in_flight is not cleared. > > > > Fixes: ff58fda09096 ("net: enetc: prioritize ability to go down over packet > processing") > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Wei Fang <wei.fang@xxxxxxx> > > --- > > v2 changes: > > 1. Modify the titile and rephrase the commit meesage. > > 2. Use the new solution as described in the title > > --- > > I gave this another test under a bit different set of circumstances this time, > and I'm confident that there are still problems, which I haven't identified > though (yet). > > With 64 byte frames at 2.5 Gbps, I see this going on: > > $ xdp-bench tx eno0 & > $ while :; do taskset $((1 << 0)) hwstamp_ctl -i eno0 -r 1 && sleep 1 && taskset > $((1 << 0)) hwstamp_ctl -i eno0 -r 0 && sleep 1; done > current settings: > tx_type 0 > rx_filter 0 > new settings: > tx_type 0 > rx_filter 1 > Summary 1,556,952 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > Summary 0 rx/s 0 err,drop/s > current settings: > tx_type 0 > rx_filter 1 > Summary 0 rx/s 0 err,drop/s > [ 883.780346] fsl_enetc 0000:00:00.0 eno0: timeout for tx ring #6 clear (its > RX ring has 2072 XDP_TX frames in flight) > new settings: > tx_type 0 > rx_filter 0 > Summary 1,027 rx/s 0 err,drop/s > current settings: > tx_type 0 > rx_filter 0 > Summary 0 rx/s 0 err,drop/s > > which looks like the symptoms that the patch tries to solve. > > My previous testing was with 390 byte frames, and this did not happen. > > Please do not merge this. Oh, it looks like there are still some issues we don't know about. I did test using 64 bytes but not at that high of a rate. Also I didn't turn on timestamp. Anyway, I will try to reproduce the issue when I'm back to office next Tuesday. It would be nice if you can help find the root cause before next Tuesday, thanks!