Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Russell,

On 19/02/2025 20:57, Russell King (Oracle) wrote:
On Wed, Feb 19, 2025 at 08:05:57PM +0000, Jon Hunter wrote:
On 19/02/2025 19:13, Russell King (Oracle) wrote:
On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote:
On 19/02/2025 15:36, Russell King (Oracle) wrote:
So clearly the phylink resolver is racing with the rest of the stmmac
resume path - which doesn't surprise me in the least. I believe I raised
the fact that calling phylink_resume() before the hardware was ready to
handle link-up is a bad idea precisely because of races like this.

The reason stmmac does this is because of it's quirk that it needs the
receive clock from the PHY in order for stmmac_reset() to work.

I do see the reset fail infrequently on previous kernels with this device
and when it does I see these messages ...

   dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma
   dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine
    initialization failed

I wonder whether it's also racing with phylib, but phylink_resume()
calling phylink_start() going in to call phy_start() is all synchronous.
That causes __phy_resume() to be called.

Which PHY device/driver is being used?


Looks like it is this Broadcom driver ...

  Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1

I don't see anything special happening in the PHY driver - it doesn't
implement suspend/resume/config_aneg methods, so there's nothing going
on with clocks in that driver beyond generic stuff.

So, let's try something (I haven't tested this, and its likely you
will need to work it in to your other change.)

Essentially, this disables the receive clock stop around the reset,
something the stmmac driver has never done in the past.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1cbea627b216..8e975863a2e3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev)
  	rtnl_lock();
  	mutex_lock(&priv->lock);
+ phy_eee_rx_clock_stop(priv->dev->phydev, false);
+
  	stmmac_reset_queues_param(priv);
stmmac_free_tx_skbufs(priv);
@@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev)
stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); + phy_eee_rx_clock_stop(priv->dev->phydev,
+			      priv->phylink_config.eee_rx_clk_stop_enable);
+
  	stmmac_enable_all_queues(priv);
  	stmmac_enable_all_dma_irq(priv);


Sorry for the delay, I have been testing various issues recently and needed a bit more time to test this.

It turns out that what I had proposed last week does not work. I believe that with all the various debug/instrumentation I had added, I was again getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, it did not work :-(

However, what you are suggesting above, all by itself, is working. I have tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working reliably. I have also tested on some other boards that use the same stmmac driver (but use the Aquantia PHY) and I have not seen any issues. So this does fix the issue I am seeing.

I know we are getting quite late in the rc for v6.14, but not sure if we could add this as a fix?

Thanks!
Jon

--
nvpublic





[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux