Hi All, On 26.08.2022 10:29, Marek Szyprowski wrote: > On 26.08.2022 09:53, Lukas Wunner wrote: >> On Fri, Aug 26, 2022 at 09:41:46AM +0200, Marek Szyprowski wrote: >>> On 26.08.2022 09:19, Lukas Wunner wrote: >>>> On Fri, Aug 26, 2022 at 08:51:58AM +0200, Marek Szyprowski wrote: >>>>> On 19.05.2022 23:22, Marek Szyprowski wrote: >>>>>> On 19.05.2022 21:08, Lukas Wunner wrote: >>>>>>> On Tue, May 17, 2022 at 12:18:45PM +0200, Marek Szyprowski wrote: >>>>>>>> This patch landed in the recent linux next-20220516 as commit >>>>>>>> 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY >>>>>>>> driver to >>>>>>>> avoid polling"). Unfortunately it breaks smsc95xx usb ethernet >>>>>>>> operation >>>>>>>> after system suspend-resume cycle. On the Odroid XU3 board I >>>>>>>> got the >>>>>>>> following warning in the kernel log: >>>>>>>> >>>>>>>> # time rtcwake -s10 -mmem >>>>>>>> rtcwake: wakeup from "mem" using /dev/rtc0 at Tue May 17 >>>>>>>> 09:16:07 2022 >>>>>>>> PM: suspend entry (deep) >>>>>>>> Filesystems sync: 0.001 seconds >>>>>>>> Freezing user space processes ... (elapsed 0.002 seconds) done. >>>>>>>> OOM killer disabled. >>>>>>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) >>>>>>>> done. >>>>>>>> printk: Suspending console(s) (use no_console_suspend to debug) >>>>>>>> smsc95xx 4-1.1:1.0 eth0: entering SUSPEND2 mode >>>>>>>> smsc95xx 4-1.1:1.0 eth0: Failed to read reg index 0x00000114: -113 >>>>>>>> smsc95xx 4-1.1:1.0 eth0: Error reading MII_ACCESS >>>>>>>> smsc95xx 4-1.1:1.0 eth0: __smsc95xx_mdio_read: MII is busy >>>>>>>> ------------[ cut here ]------------ >>>>>>>> WARNING: CPU: 2 PID: 73 at drivers/net/phy/phy.c:946 >>>>>>>> phy_state_machine+0x98/0x28c >>>>>>> [...] >>>>>>>> It looks that the driver's suspend/resume operations might need >>>>>>>> some >>>>>>>> adjustments. After the system suspend/resume cycle the driver >>>>>>>> is not >>>>>>>> operational anymore. Reverting the $subject patch on top of linux >>>>>>>> next-20220516 restores ethernet operation after system >>>>>>>> suspend/resume. >>>>>>> Thanks a lot for the report. It seems the PHY is signaling a link >>>>>>> change >>>>>>> shortly before system sleep and by the time the phy_state_machine() >>>>>>> worker >>>>>>> gets around to handle it, the device has already been suspended >>>>>>> and thus >>>>>>> refuses any further USB requests with -EHOSTUNREACH (-113): >>>> [...] >>>>>>> Assuming the above theory is correct, calling phy_stop_machine() >>>>>>> after usbnet_suspend() would be sufficient to fix the issue. >>>>>>> It cancels the phy_state_machine() worker. >>>>>>> >>>>>>> The small patch below does that. Could you give it a spin? >>>>>> That's it. Your analysis is right and the patch fixes the issue. >>>>>> Thanks! >>>>>> >>>>>> Feel free to add: >>>>>> >>>>>> Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> >>>>>> >>>>>> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> >>>>> Gentle ping for the final patch... >>>> Hm? Actually this issue is supposed to be fixed by mainline commit >>>> 1758bde2e4aa ("net: phy: Don't trigger state machine while in >>>> suspend"). >>>> >>>> The initial fix attempt that you're replying to should not be >>>> necessary >>>> with that commit. >>>> >>>> Are you still seeing issues even with 1758bde2e4aa applied? >>>> Or are you maybe using a custom downstream tree which is missing >>>> that commit? >>> On Linux next-20220825 I still get the following warning during >>> suspend/resume cycle: >>> >>> ------------[ cut here ]------------ >>> WARNING: CPU: 0 PID: 1483 at drivers/net/phy/phy_device.c:323 >>> mdio_bus_phy_resume+0x10c/0x110 >>> Modules linked in: exynos_gsc s5p_jpeg s5p_mfc videobuf2_dma_contig >>> v4l2_mem2mem videobuf2_memops videobuf2_v4l2 videobuf2_common videodev >>> mc s5p_cec >>> CPU: 0 PID: 1483 Comm: rtcwake Not tainted 6.0.0-rc2-next-20220825 >>> #5482 >>> Hardware name: Samsung Exynos (Flattened Device Tree) >>> unwind_backtrace from show_stack+0x10/0x14 >>> show_stack from dump_stack_lvl+0x58/0x70 >>> dump_stack_lvl from __warn+0xc8/0x220 >>> __warn from warn_slowpath_fmt+0x5c/0xb4 >>> warn_slowpath_fmt from mdio_bus_phy_resume+0x10c/0x110 >>> mdio_bus_phy_resume from dpm_run_callback+0x94/0x208 >>> dpm_run_callback from device_resume+0x124/0x21c >>> device_resume from dpm_resume+0x108/0x278 >>> dpm_resume from dpm_resume_end+0xc/0x18 >>> dpm_resume_end from suspend_devices_and_enter+0x208/0x70c >>> suspend_devices_and_enter from pm_suspend+0x364/0x430 >>> pm_suspend from state_store+0x68/0xc8 >>> state_store from kernfs_fop_write_iter+0x110/0x1d4 >>> kernfs_fop_write_iter from vfs_write+0x1c4/0x2ac >>> vfs_write from ksys_write+0x5c/0xd4 >>> ksys_write from ret_fast_syscall+0x0/0x1c >>> Exception stack(0xf2ee5fa8 to 0xf2ee5ff0) >>> 5fa0: 00000004 0002b438 00000004 0002b438 00000004 >>> 00000000 >>> 5fc0: 00000004 0002b438 000291b0 00000004 0002b438 00000004 befd9c1c >>> 00028160 >>> 5fe0: 0000006c befd9ae8 b6eb4148 b6f118a4 >>> irq event stamp: 58381 >>> hardirqs last enabled at (58393): [<c019ff28>] >>> vprintk_emit+0x320/0x344 >>> hardirqs last disabled at (58400): [<c019fedc>] >>> vprintk_emit+0x2d4/0x344 >>> softirqs last enabled at (58258): [<c0101694>] >>> __do_softirq+0x354/0x618 >>> softirqs last disabled at (58247): [<c012dd18>] >>> __irq_exit_rcu+0x140/0x1ec >>> ---[ end trace 0000000000000000 ]--- >>> >>> The mentioned patch fixes it. >> Color me confused. >> >> With "the mentioned patch", are you referring to 1758bde2e4aa >> or are you referring to the little test patch in this e-mail: >> https://lore.kernel.org/netdev/20220519190841.GA30869@xxxxxxxxx/ >> >> There's a Tested-by from you on 1758bde2e4aa, so I assume the >> commit fixed the issue at the time. Does it still occur >> intermittently? Or does it occur every time? In the latter case, >> I'd assume some other change was made in the meantime and that >> other change exposed the issue again... > > The issue seems to be exposed again. On 1758bde2e4aa everything works > fine and there is no warning during suspend/resume cycle. Then I've > noticed that warning again while playing with current linux-next. I > will check when it got broken and report again. I've finally traced what has happened. I've double checked and indeed the 1758bde2e4aa commit fixed the issue on next-20220516 kernel and as such it has been merged to linus tree. Then the commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") has been merged to linus tree, which triggers a new warning during the suspend/resume cycle with smsc95xx driver. Please note, that the smsc95xx still works fine regardless that warning. However it look that the commit 1758bde2e4aa only hide a real problem, which the commit 744d23c71af3 warns about. Probably a proper fix for smsc95xx driver is to call phy_stop/start during suspend/resume cycle, like in similar patches for other drivers: https://lore.kernel.org/all/20220825023951.3220-1-f.fainelli@xxxxxxxxx/ Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland