RE: [PATCH v3 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings are disabled

Wei Fang <wei.fang@xxxxxxx> · Wed, 9 Oct 2024 12:56:38 +0000

> -----Original Message-----
> From: Vladimir Oltean <vladimir.oltean@xxxxxxx>
> Sent: 2024年10月9日 20:10
> To: Wei Fang <wei.fang@xxxxxxx>
> Cc: davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx;
> pabeni@xxxxxxxxxx; Claudiu Manoil <claudiu.manoil@xxxxxxx>;
> ast@xxxxxxxxxx; daniel@xxxxxxxxxxxxx; hawk@xxxxxxxxxx;
> john.fastabend@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
> netdev@xxxxxxxxxxxxxxx; bpf@xxxxxxxxxxxxxxx; stable@xxxxxxxxxxxxxxx;
> imx@xxxxxxxxxxxxxxx; rkannoth@xxxxxxxxxxx; maciej.fijalkowski@xxxxxxxxx;
> sbhatta@xxxxxxxxxxx
> Subject: Re: [PATCH v3 net 3/3] net: enetc: disable IRQ after Rx and Tx BD rings
> are disabled
> 
> On Wed, Oct 09, 2024 at 05:03:27PM +0800, Wei Fang wrote:
> > When running "xdp-bench tx eno0" to test the XDP_TX feature of ENETC
> > on LS1028A, it was found that if the command was re-run multiple times,
> > Rx could not receive the frames, and the result of xdo-bench showed
> 
> xdp-bench
> 
> > that the rx rate was 0.
> >
> > root@ls1028ardb:~# ./xdp-bench tx eno0
> > Hairpinning (XDP_TX) packets on eno0 (ifindex 3; driver fsl_enetc)
> > Summary                      2046 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> > Summary                         0 rx/s                  0
> err,drop/s
> >
> > By observing the Rx PIR and CIR registers, we found that CIR is always
> > equal to 0x7FF and PIR is always 0x7FE, which means that the Rx ring
> > is full and can no longer accommodate other Rx frames. Therefore, we
> > can conclude that the problem is caused by the Rx BD ring not being
> > cleaned up.
> >
> > Further analysis of the code revealed that the Rx BD ring will only
> > be cleaned if the "cleaned_cnt > xdp_tx_in_flight" condition is met.
> > Therefore, some debug logs were added to the driver and the current
> > values of cleaned_cnt and xdp_tx_in_flight were printed when the Rx
> > BD ring was full. The logs are as follows.
> >
> > [  178.762419] [XDP TX] >> cleaned_cnt:1728, xdp_tx_in_flight:2140
> > [  178.771387] [XDP TX] >> cleaned_cnt:1941, xdp_tx_in_flight:2110
> > [  178.776058] [XDP TX] >> cleaned_cnt:1792, xdp_tx_in_flight:2110
> >
> > From the results, we can see that the max value of xdp_tx_in_flight
> > has reached 2140. However, the size of the Rx BD ring is only 2048.
> > This is incredible, so we checked the code again and found that
> > xdp_tx_in_flight did not drop to 0 when the bpf program was uninstalled
> > and it was not reset when the bfp program was installed again.
> 
> Please make it clear that this is more general and it happens whenever
> enetc_stop() is called.
> 
> > The root cause is that the IRQ is disabled too early in enetc_stop(),
> > resulting in enetc_recycle_xdp_tx_buff() not being called, therefore,
> > xdp_tx_in_flight is not cleared.
> 
> I feel that the problem is not so much the IRQ, as the NAPI (softirq),
> really. Under heavy traffic we don't even get that many hardirqs (if any),
> but NAPI just reschedules itself because of the budget which constantly
> gets exceeded. Please make this also clear in the commit title,
> something like "net: enetc: disable NAPI only after TX rings are empty".
> 
> I would restate the problem as: "The root cause is that we disable NAPI
> too aggressively, without having waited for the pending XDP_TX frames to
> be transmitted, and their buffers recycled, so that the xdp_tx_in_flight
> counter can naturally drop to zero. Later, enetc_free_tx_ring() does
> free those stale, untransmitted XDP_TX packets, but it is not coded up
> to also reset the xdp_tx_in_flight counter, hence the manifestation of
> the bug."
> 
> And then we should have a paragraph that describes the solution as well.
> "One option would be to cover this extra condition in enetc_free_tx_ring(),
> but now that the ENETC_TX_DOWN exists, we have created a window at the
> beginning of enetc_stop() where NAPI can still be scheduled, but any
> concurrent enqueue will be blocked. Therefore, we can call enetc_wait_bdrs()
> and enetc_disable_tx_bdrs() with NAPI still scheduled, and it is
> guaranteed that this will not wait indefinitely, but instead give us an
> indication that the pending TX frames have orderly dropped to zero.
> Only then should we call napi_disable().
> 
> This way, enetc_free_tx_ring() becomes entirely redundant and can be
> dropped as part of subsequent cleanup.
> 
> The change also refactors enetc_start() so that it looks like the mirror
> opposite procedure of enetc_stop()."
> 
> I think describing the problem and solution in these terms gives the
> reviewers more versed in NAPI a better chance of understanding what is
> going on and what we are trying to achieve.

Thanks for helping rephrase the commit message, I will applying it to
the next version.