On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote: > On 19/02/2025 15:36, Russell King (Oracle) wrote: > > So clearly the phylink resolver is racing with the rest of the stmmac > > resume path - which doesn't surprise me in the least. I believe I raised > > the fact that calling phylink_resume() before the hardware was ready to > > handle link-up is a bad idea precisely because of races like this. > > > > The reason stmmac does this is because of it's quirk that it needs the > > receive clock from the PHY in order for stmmac_reset() to work. > > I do see the reset fail infrequently on previous kernels with this device > and when it does I see these messages ... > > dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma > dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine > initialization failed I wonder whether it's also racing with phylib, but phylink_resume() calling phylink_start() going in to call phy_start() is all synchronous. That causes __phy_resume() to be called. Which PHY device/driver is being used? > > So, my preference would be to move phylink_resume() later, removing > > the race condition. If there's any regressions, then we need to > > _properly_ solve them by ensuring that the PHY keeps the RX clock > > running by honouring PHY_F_RXC_ALWAYS_ON. That's going to need > > everyone to test their stmmac platforms to find all the cases that > > need fixing... > > Thanks for the in-depth analysis and feedback. We have 3 SoCs that use this > driver and so I will do some testing with this change on all of them. Thanks! -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!