Re: [PATCH 3/3] net: stmmac: Add DWMAC glue layer for Renesas GBETH

"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> · Mon, 3 Mar 2025 16:32:07 +0000

On Mon, Mar 03, 2025 at 04:04:55PM +0000, Lad, Prabhakar wrote:
> Hi Russell,
> 
> On Mon, Mar 3, 2025 at 11:19 AM Russell King (Oracle)
> <linux@xxxxxxxxxxxxxxx> wrote:
> > I would like to get to the bottom of why this fails for module removal/
> > insertion, but not for admistratively down/upping the interface.
> >
> > Removal of your module will unregister the netdev, and part of that
> > work will bring the netdev administratively down. When re-inserting
> > the module, that will trigger various userspace events, and it will
> > be userspace bringing the network interface(s) back up. This should
> > be no different from administratively down/upping the interface but
> > it seems you get different behaviour.
> >
> > I'd like to understand why that is, because at the moment I'm wondering
> > whether my patches that address the suspend/resume need further work
> > before I send them - but in order to assess that, I need to work out
> > why your issue only seems to occur in the module removal/insertion
> > and not down/up as well as I'd expect.
> >
> > Please could you investigate this?
> >
> Sure I will look into this. Just wanted to check on your platform does
> unload/load work OK? Also do you know any specific reason why DMA
> reset could be failing so that I can look at it closer.

It may be surprising, but I do not have stmmac hardware (although
there is some I might be able to use, it's rather complicated so I
haven't investigated that.) However, there's a lot of past history
here, because stmmac has been painful for me as phylink maintainer.
Consequently, I'm now taking a more active role in this driver,
cleaning it up and fixing some of the stuff it's got wrong.

That said, NVidia are in the process of arranging hardware for me.

You are not the first to encounter reset failures, and this has always
come down to clocks that aren't running.

The DWMAC core is documented as requiring *all* clocks for each part of
the core to be running in order for software reset to complete. If any
clock is stopped, then reset will fail. That includes the clk_rx_i /
clk_rx_180_i signals that come from the ethernet PHY's receive clock.

However, PHYs that have negotiated EEE are permitted to stop their
receive clock, which can be enabled by an appropriate control bit.
phy_eee_rx_clock_stop() manipulates that bit. stmmac has in most
cases permitted the PHY to stop its receive clock.

NVidia have been a recent victim of this - it is desirable to allow
receive clock stop, but there hasn't been the APIs in the kernel
to allow MAC drivers to re-enable the clock when they need it.

Up until now, I had thought this was just a suspend/resume issue
(which is NVidia's reported case). Your testing suggests that it is
more widespread than that.

While I've been waiting to hear from you, I've prepared some patches
that change the solution that I proposed for NVidia (currently on top
of that patch set).

However, before I proceed with them, I need you to get to the bottom
of why:

# ip li set dev $if down
# ip li set dev $if up

doesn't trigger it, but removing and re-inserting the module does.

I'd suggest looking at things such as:
- does the media link actually go down in one case but not the other
  (I don't mean does the kernel report the link went down - I mean
  did the remote end see the link go down, or is it still up, and
  thus *may* be in EEE low-power idle mode.)

- printing the statis from stmmac_host_irq_status() so we can see
  when the DWMAC tx/rx paths enters and exits LPI mode while the
  driver is active. (could be quite noisy).

- verify that .ndo_stop does get called when removing your module
  (it should, it's a core net function.)

- print the value of the LPI control/status register at various
  points that may be relevant (e.g. before the reset function is
  called.) bits 9 and 8 indicate receive and transmit LPI status.

I'm sure there's other things, but the above is just off the top of my
head.

Thanks for anything you can do to locate this.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!