Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 11, 2025 at 01:25:58PM +0000, Jon Hunter wrote:
> 
> On 10/03/2025 14:20, Jon Hunter wrote:
> > 
> > On 07/03/2025 17:07, Russell King (Oracle) wrote:
> > > On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote:
> > > > Hi Russell,
> > > > 
> > > > On 06/03/2025 15:23, Russell King (Oracle) wrote:
> > > > > Hi,
> > > > > 
> > > > > This is a second approach to solving the STMMAC reset issues caused by
> > > > > the lack of receive clock from the PHY where the media is in low power
> > > > > mode with a PHY that supports receive clock-stop.
> > > > > 
> > > > > The first approach centred around only addressing the issue in the
> > > > > resume path, but it seems to also happen when the platform glue module
> > > > > is removed and re-inserted (Jon - can you check whether that's also
> > > > > the case for you please?)
> > > > > 
> > > > > As this is more targetted, I've dropped the patches from this series
> > > > > which move the call to phylink_resume(), so the link may still come
> > > > > up too early on resume - but that's something I also intend to fix.
> > > > > 
> > > > > This is experimental - so I value test reports for this change.
> > > > 
> > > > 
> > > > The subject indicates 3 patches, but I only see 2 patches? Can
> > > > you confirm
> > > > if there are 2 or 3?
> > > 
> > > Yes, 2 patches is correct.
> > > 
> > > > So far I have only tested to resume case with the 2 patches to make that
> > > > that is working but on Tegra186, which has been the most
> > > > problematic, it is
> > > > not working reliably on top of next-20250305.
> > > 
> > > To confirm, you're seeing stmmac_reset() sporadically timing out on
> > > resume even with these patches appled? That's rather disappointing.
> > 
> > So I am no longer seeing the reset fail, from what I can see, but now
> > NFS is not responding after resume ...
> > 
> > [   49.825094] Enabling non-boot CPUs ...
> > [   49.829760] Detected PIPT I-cache on CPU1
> > [   49.832694] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004
> > [   49.844120] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116
> > [   49.856231] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066
> > [   49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030]
> > [   49.875389] CPU1 is up
> > [   49.877187] Detected PIPT I-cache on CPU2
> > [   49.880824] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004
> > [   49.892266] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116
> > [   49.904467] CPU features: SANITY CHECK: Unexpected variation in
> > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066
> > [   49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030]
> > [   49.923610] CPU2 is up
> > [   49.925194] Detected PIPT I-cache on CPU3
> > [   49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073]
> > [   49.935866] CPU3 is up
> > [   49.937983] Detected PIPT I-cache on CPU4
> > [   49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073]
> > [   49.948593] CPU4 is up
> > [   49.950810] Detected PIPT I-cache on CPU5
> > [   49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073]
> > [   49.961431] CPU5 is up
> > [   50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/
> > rgmii link mode
> > [   50.077634] dwmac4: Master AXI performs any burst length
> > [   50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features
> > support found
> > [   50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008
> > Advanced Timestamp supported
> > [   50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/
> > Full - flow control rx/tx
> > [   50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector:
> > repeated role: device
> > [   50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06
> > 13:39:28 UTC
> > [   50.147552] OOM killer enabled.
> > [   50.148441] Restarting tasks ... done.
> > [   50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed
> > [   50.154761] random: crng reseeded on system resumption
> > [   50.162912] PM: suspend exit
> > [   50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed
> > [   50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed
> > [   50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed
> > [  234.474848] nfs: server 10.26.51.252 not responding, still trying
> > [  234.538769] nfs: server 10.26.51.252 not responding, still trying
> > [  237.546922] nfs: server 10.26.51.252 not responding, still trying
> > [  254.762753] nfs: server 10.26.51.252 not responding, timed out
> > [  254.762771] nfs: server 10.26.51.252 not responding, timed out
> > [  254.766376] nfs: server 10.26.51.252 not responding, timed out
> > [  254.766392] nfs: server 10.26.51.252 not responding, timed out
> > [  254.783778] nfs: server 10.26.51.252 not responding, timed out
> > [  254.789582] nfs: server 10.26.51.252 not responding, timed out
> > [  254.795421] nfs: server 10.26.51.252 not responding, timed out
> > [  254.801193] nfs: server 10.26.51.252 not responding, timed out
> > 
> > > Do either of the two attached diffs make any difference?
> > 
> > I will try these next.
> 
> 
> I tried both of the diffs, but both had the same problem as above and
> I see these nfs timeouts after resuming. What works the best is the
> original change you proposed (this is based upon the latest two
> patches) ...

I'm wondering whether there's something else which needs the RX clock
running in order to take effect.

> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index e2146d3aee74..48a646b76a29 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
>         if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE))
>                 priv->plat->dma_cfg->atds = 1;
> -       /* Note that the PHY clock must be running for reset to complete. */
> -       phylink_rx_clk_stop_block(priv->phylink);
>         ret = stmmac_reset(priv, priv->ioaddr);
> -       phylink_rx_clk_stop_unblock(priv->phylink);
>         if (ret) {
>                 netdev_err(priv->dev, "Failed to reset the dma\n");
>                 return ret;
> @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev)
>         rtnl_lock();
>         mutex_lock(&priv->lock);
> +       /* Note that the PHY clock must be running for reset to complete. */
> +       phylink_rx_clk_stop_block(priv->phylink);
>         stmmac_reset_queues_param(priv);
>         stmmac_free_tx_skbufs(priv);
> @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev)
>         stmmac_set_rx_mode(ndev);
>         stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw);
> +       phylink_rx_clk_stop_unblock(priv->phylink);
>         stmmac_enable_all_queues(priv);
>         stmmac_enable_all_dma_irq(priv);

If you haven't already, can you try shrinking down the number of
functions that are within the block..unblock region please?

Looking at the functions called:

stmmac_reset_queues_param()
stmmac_free_tx_skbufs()
stmmac_clear_descriptors()
	These look like it's only manipulating software state

stmmac_hw_setup()
	We know this calls stmmac_reset() and thus needs the blocking

stmmac_init_coalesce()
	Looks like it's only manipulating software state

stmmac_set_rx_mode()
	This manipulates GMAC_RXQ_CTRL4, GMAC_HASH_TAB(*),
	GMAC_ADDR_HIGH(*), GMAC_ADDR_LOW(*), and GMAC_PACKET_FILTER.

stmmac_restore_hw_vlan_rx_fltr()
	This manipulates GMAC_VLAN_TAG_DATA, GMAC_VLAN_TAG,
	GMAC_VLAN_HASH_TABLE, GMAC_VLAN_TAG

I wonder whether the last two also require the RX clock to be running.

The reason I want to track this down is that we may need to add
block..unblock elsewhere in the driver to ensure that the RX clock
is running when configuration is done elsewhere.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!




[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux