On Mon, Mar 03, 2025 at 04:04:55PM +0000, Lad, Prabhakar wrote: > Hi Russell, > > On Mon, Mar 3, 2025 at 11:19 AM Russell King (Oracle) > <linux@xxxxxxxxxxxxxxx> wrote: > > I would like to get to the bottom of why this fails for module removal/ > > insertion, but not for admistratively down/upping the interface. > > > > Removal of your module will unregister the netdev, and part of that > > work will bring the netdev administratively down. When re-inserting > > the module, that will trigger various userspace events, and it will > > be userspace bringing the network interface(s) back up. This should > > be no different from administratively down/upping the interface but > > it seems you get different behaviour. > > > > I'd like to understand why that is, because at the moment I'm wondering > > whether my patches that address the suspend/resume need further work > > before I send them - but in order to assess that, I need to work out > > why your issue only seems to occur in the module removal/insertion > > and not down/up as well as I'd expect. > > > > Please could you investigate this? > > > Sure I will look into this. Just wanted to check on your platform does > unload/load work OK? Also do you know any specific reason why DMA > reset could be failing so that I can look at it closer. It may be surprising, but I do not have stmmac hardware (although there is some I might be able to use, it's rather complicated so I haven't investigated that.) However, there's a lot of past history here, because stmmac has been painful for me as phylink maintainer. Consequently, I'm now taking a more active role in this driver, cleaning it up and fixing some of the stuff it's got wrong. That said, NVidia are in the process of arranging hardware for me. You are not the first to encounter reset failures, and this has always come down to clocks that aren't running. The DWMAC core is documented as requiring *all* clocks for each part of the core to be running in order for software reset to complete. If any clock is stopped, then reset will fail. That includes the clk_rx_i / clk_rx_180_i signals that come from the ethernet PHY's receive clock. However, PHYs that have negotiated EEE are permitted to stop their receive clock, which can be enabled by an appropriate control bit. phy_eee_rx_clock_stop() manipulates that bit. stmmac has in most cases permitted the PHY to stop its receive clock. NVidia have been a recent victim of this - it is desirable to allow receive clock stop, but there hasn't been the APIs in the kernel to allow MAC drivers to re-enable the clock when they need it. Up until now, I had thought this was just a suspend/resume issue (which is NVidia's reported case). Your testing suggests that it is more widespread than that. While I've been waiting to hear from you, I've prepared some patches that change the solution that I proposed for NVidia (currently on top of that patch set). However, before I proceed with them, I need you to get to the bottom of why: # ip li set dev $if down # ip li set dev $if up doesn't trigger it, but removing and re-inserting the module does. I'd suggest looking at things such as: - does the media link actually go down in one case but not the other (I don't mean does the kernel report the link went down - I mean did the remote end see the link go down, or is it still up, and thus *may* be in EEE low-power idle mode.) - printing the statis from stmmac_host_irq_status() so we can see when the DWMAC tx/rx paths enters and exits LPI mode while the driver is active. (could be quite noisy). - verify that .ndo_stop does get called when removing your module (it should, it's a core net function.) - print the value of the LPI control/status register at various points that may be relevant (e.g. before the reset function is called.) bits 9 and 8 indicate receive and transmit LPI status. I'm sure there's other things, but the above is just off the top of my head. Thanks for anything you can do to locate this. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!