On Wed, Feb 19, 2025 at 08:05:57PM +0000, Jon Hunter wrote: > On 19/02/2025 19:13, Russell King (Oracle) wrote: > > On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote: > > > On 19/02/2025 15:36, Russell King (Oracle) wrote: > > > > So clearly the phylink resolver is racing with the rest of the stmmac > > > > resume path - which doesn't surprise me in the least. I believe I raised > > > > the fact that calling phylink_resume() before the hardware was ready to > > > > handle link-up is a bad idea precisely because of races like this. > > > > > > > > The reason stmmac does this is because of it's quirk that it needs the > > > > receive clock from the PHY in order for stmmac_reset() to work. > > > > > > I do see the reset fail infrequently on previous kernels with this device > > > and when it does I see these messages ... > > > > > > dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma > > > dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine > > > initialization failed > > > > I wonder whether it's also racing with phylib, but phylink_resume() > > calling phylink_start() going in to call phy_start() is all synchronous. > > That causes __phy_resume() to be called. > > > > Which PHY device/driver is being used? > > > Looks like it is this Broadcom driver ... > > Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 I don't see anything special happening in the PHY driver - it doesn't implement suspend/resume/config_aneg methods, so there's nothing going on with clocks in that driver beyond generic stuff. So, let's try something (I haven't tested this, and its likely you will need to work it in to your other change.) Essentially, this disables the receive clock stop around the reset, something the stmmac driver has never done in the past. diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 1cbea627b216..8e975863a2e3 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) rtnl_lock(); mutex_lock(&priv->lock); + phy_eee_rx_clock_stop(priv->dev->phydev, false); + stmmac_reset_queues_param(priv); stmmac_free_tx_skbufs(priv); @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); + phy_eee_rx_clock_stop(priv->dev->phydev, + priv->phylink_config.eee_rx_clk_stop_enable); + stmmac_enable_all_queues(priv); stmmac_enable_all_dma_irq(priv); -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!