Commit title still mentions only XDP_REDIRECT, whereas implementation also touches XDP_TX (and only makes a very minor mention of it). Wouldn't it be better to have "net: enetc: block concurrent XDP transmissions during ring reconfiguration" for a commit title? On Wed, Oct 09, 2024 at 05:03:26PM +0800, Wei Fang wrote: > When testing the XDP_REDIRECT function on the LS1028A platform, we > found a very reproducible issue that the Tx frames can no longer be > sent out even if XDP_REDIRECT is turned off. Specifically, if there > is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, > the console may display some warnings like "timeout for tx ring #6 > clear", and all redirected frames will be dropped, the detaild log detailed > is as follows. > > root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 > Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) > [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear > [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear > [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear > eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s > xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg > eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s > xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg > > By analyzing the XDP_REDIRECT implementation of enetc driver, we > found two problems. First, enetc driver will reconfigure Tx and > Rx BD rings when a bpf program is installed or uninstalled, but > there is no mechanisms to block the redirected frames when enetc > driver reconfigures BD rings. So introduce ENETC_TX_DOWN flag to (.. driver reconfigures BD rings.) Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the TX rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, we also have to introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames. > prevent the redirected frames to be attached to Tx BD rings. This > is not only used to block XDP_REDIRECT frames, but also to block > XDP_TX frames. > > Second, Tx BD rings are disabled first in enetc_stop() and then > wait for empty. This operation is not safe while the Tx BD ring the driver waits for them to become empty. > is actively transmitting frames, and will cause the ring to not > be empty and hardware exception. As described in the block guide > of LS1028A NETC, software should only disable an active ring after > all pending ring entries have been consumed (i.e. when PI = CI). > Disabling a transmit ring that is actively processing BDs risks > a HW-SW race hazard whereby a hardware resource becomes assigned > to work on one or more ring entries only to have those entries be > removed due to the ring becoming disabled. So the correct behavior > is that the software stops putting frames on the Tx BD rings (this > is what ENETC_TX_DOWN does), then waits for the Tx BD rings to be > empty, and finally disables the Tx BD rings. It feels like this separate part (refactoring of enetc_start() and enetc_stop() operation ordering) should be its own patch? It is logically different than the introduction and checking of the ENETC_TX_DOWN condition.