On 13/02/2025 12:00, Russell King (Oracle) wrote:
...
I have been tracking down a suspend regression on Tegra186 and bisect is
pointing to this change. If I revert this on top of v6.14-rc2 then
suspend is working again. This is observed on the Jetson TX2 board
(specifically tegra186-p2771-0000.dts).
Thanks for the report.
This device is using NFS for testing. So it appears that for this board
networking does not restart and the board hangs. Looking at the logs I
do see this on resume ...
[ 64.129079] dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma
[ 64.133125] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed
My first thought was if 'dma_cap.eee' is not supported for this device,
but from what I can see it is and 'dma_cap.eee' is true. Here are some
more details on this device regarding the ethernet controller.
Could you see whether disabling EEE through ethtool (maybe first try
turning tx-lpi off before using the "eee off") to see whether that
makes any difference please?
One thing that I'm wondering is - old code used to do:
- phy_eee_rx_clock_stop(phy, !(priv->plat->flags &
- STMMAC_FLAG_RX_CLK_RUNS_IN_LPI));
The new code sets:
+ if (!(priv->plat->flags & STMMAC_FLAG_RX_CLK_RUNS_IN_LPI))
+ priv->phylink_config.eee_rx_clk_stop_enable = true;
which does the same thing in phylink - phylink_bringup_phy() will call
phy_eee_rx_clock_stop() when the PHY is attahed. So this happens at a
different time.
We know that stmmac_reset() can fail when the PHY receive clock is
stopped - at least with some cores.
So, I'm wondering whether I've inadvertently fixed another bug in stmmac
which has uncovered a different bug - maybe the PHY clock must never be
stopped even in LPI - or maybe we need to have a way of temporarily
disabling the PHY's clock-stop ability during stmmac_reset().
In addition to what I asked previously, could you also intrument
phy_eee_rx_clock_stop() and test before/after this patch to see
(a) whether it gets called at all before this patch and (b) confirm
the enable/disable state before and after.
Thanks for the feedback. So ...
1. I can confirm that suspend works if I disable EEE via ethtool
2. Prior to this change I do see phy_eee_rx_clock_stop being called
to enable the clock resuming from suspend, but after this change
it is not.
Prior to this change I see (note the prints around 389-392 are when
we resume from suspend) ...
[ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0
[ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode
[ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1
[ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode
[ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1
After this change I see ...
[ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1
[ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode
[ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode
So yes definitely related to the PHY clock.
Jon
--
nvpublic