On 11.03.2020 10:17, Geert Uytterhoeven wrote: > On Tue, Mar 10, 2020 at 5:47 PM Florian Fainelli <f.fainelli@xxxxxxxxx> wrote: >> >> On 3/10/20 7:16 AM, Geert Uytterhoeven wrote: >>> Hi Florian, David, >>> >>> On Mon, Feb 24, 2020 at 5:59 AM David Miller <davem@xxxxxxxxxxxxx> wrote: >>>> From: Florian Fainelli <f.fainelli@xxxxxxxxx> >>>> Date: Thu, 20 Feb 2020 15:34:53 -0800 >>>> >>>>> It is currently possible for a PHY device to be suspended as part of a >>>>> network device driver's suspend call while it is still being attached to >>>>> that net_device, either via phy_suspend() or implicitly via phy_stop(). >>>>> >>>>> Later on, when the MDIO bus controller get suspended, we would attempt >>>>> to suspend again the PHY because it is still attached to a network >>>>> device. >>>>> >>>>> This is both a waste of time and creates an opportunity for improper >>>>> clock/power management bugs to creep in. >>>>> >>>>> Fixes: 803dd9c77ac3 ("net: phy: avoid suspending twice a PHY") >>>>> Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx> >>>> >>>> Applied, and queued up for -stable, thanks Florian. >>> >>> This patch causes a regression on r8a73a4/ape6evm and sh73a0/kzm9g. >>> After resume from s2ram, Ethernet no longer works: >>> >>> PM: suspend exit >>> nfs: server aaa.bbb.ccc.ddd not responding, still trying >>> ... >>> >>> Reverting commit 503ba7c6961034ff ("net: phy: Avoid multiple suspends") >>> fixes the issue. >>> >>> On both boards, an SMSC LAN9220 is connected to a power-managed local >>> bus. >>> >>> I added some debug code to check when the clock driving the local bus >>> is stopped and started, but I see no difference before/after. Hence I >>> suspect the Ethernet chip is no longer reinitialized after resume. >> >> Can you provide a complete log? > > With some debug info: > > SDHI0 Vcc: disabling > PM: suspend entry (deep) > Filesystems sync: 0.002 seconds > Freezing user space processes ... (elapsed 0.001 seconds) done. > OOM killer disabled. > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > PM: ==== a3sp/ee120000.sd: stop > PM: ==== a3sp/ee100000.sd: stop > smsc911x 8000000.ethernet: smsc911x_suspend:2577 > smsc911x 8000000.ethernet: smsc911x_suspend:2579 running > smsc911x 8000000.ethernet: smsc911x_suspend:2584 > PM: ==== a3sp/ee200000.mmc: stop > PM: ==== c4/fec10000.bus: stop > PM: ==== a3sp/e6c40000.serial: stop > PM: ==== c5/e61f0000.thermal: stop > PM: ==== c4/e61c0200.interrupt-controller: stop > PM: == a3sp: power off > rmobile_pd_power_down: a3sp > Disabling non-boot CPUs ... > PM: ==== c4/e61c0200.interrupt-controller: start > PM: ==== c5/e61f0000.thermal: start > PM: ==== a3sp/e6c40000.serial: start > PM: ==== c4/fec10000.bus: start > PM: ==== a3sp/ee200000.mmc: start > smsc911x 8000000.ethernet: smsc911x_resume:2606 > smsc911x 8000000.ethernet: smsc911x_resume:2625 running > PM: ==== a3sp/ee100000.sd: start > OOM killer enabled. > Restarting tasks ... done. > PM: ==== a3sp/ee120000.sd: start > PM: suspend exit > nfs: server aaa.bbb.ccc.ddd not responding, still trying > ... > > But no difference between the good and the bad case, except for the nfs > failures. > >> Do you use the Generic PHY driver or a >> specialized one? > > CONFIG_FIXED_PHY=y > CONFIG_SMSC_PHY=y > > Just the smsc,lan9115 node, cfr. arch/arm/boot/dts/r8a73a4-ape6evm.dts > >> Do you have a way to dump the registers at the time of >> failure and see if BMCR.PDOWN is still set somehow? > > Added a hook into "nfs: server not responding", which prints: > > MII_BMCR = 0x1900 > > i.e. BMCR_PDOWN = 0x0800 is still set. > >> Does the following help: >> >> diff --git a/drivers/net/ethernet/smsc/smsc911x.c >> b/drivers/net/ethernet/smsc/smsc911x.c >> index 49a6a9167af4..df17190c76c0 100644 >> --- a/drivers/net/ethernet/smsc/smsc911x.c >> +++ b/drivers/net/ethernet/smsc/smsc911x.c >> @@ -2618,6 +2618,7 @@ static int smsc911x_resume(struct device *dev) >> if (netif_running(ndev)) { >> netif_device_attach(ndev); >> netif_start_queue(ndev); >> + phy_resume(dev->phydev); >> } >> > > Yes i does, after s/dev->/ndev->/. > Thanks! > This seems to be a workaround. And the same issue we may have with other drivers too. Could you please alternatively test the following? It tackles the issue that mdio_bus_phy_may_suspend() is used in suspend AND resume, and both calls may return different values. With this patch we call mdio_bus_phy_may_suspend() only when suspending, and let the phy_device store whether it was suspended by MDIO bus PM. diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 32a5ceddc..6d6c6a178 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -286,6 +286,8 @@ static int mdio_bus_phy_suspend(struct device *dev) if (!mdio_bus_phy_may_suspend(phydev)) return 0; + phydev->suspended_by_mdio_bus = 1; + return phy_suspend(phydev); } @@ -294,9 +296,11 @@ static int mdio_bus_phy_resume(struct device *dev) struct phy_device *phydev = to_phy_device(dev); int ret; - if (!mdio_bus_phy_may_suspend(phydev)) + if (!phydev->suspended_by_mdio_bus) goto no_resume; + phydev->suspended_by_mdio_bus = 0; + ret = phy_resume(phydev); if (ret < 0) return ret; diff --git a/include/linux/phy.h b/include/linux/phy.h index 8b299476b..118de9f5b 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -357,6 +357,7 @@ struct macsec_ops; * is_gigabit_capable: Set to true if PHY supports 1000Mbps * has_fixups: Set to true if this phy has fixups/quirks. * suspended: Set to true if this phy has been suspended successfully. + * suspended_by_mdio_bus: Set to true if this phy was suspended by MDIO bus. * sysfs_links: Internal boolean tracking sysfs symbolic links setup/removal. * loopback_enabled: Set true if this phy has been loopbacked successfully. * state: state of the PHY for management purposes @@ -396,6 +397,7 @@ struct phy_device { unsigned is_gigabit_capable:1; unsigned has_fixups:1; unsigned suspended:1; + unsigned suspended_by_mdio_bus:1; unsigned sysfs_links:1; unsigned loopback_enabled:1; -- 2.25.1