On Tue, Jul 30, 2024 at 11:02:07AM +0100, Jon Hunter wrote: > > On 30/07/2024 10:41, Russell King (Oracle) wrote: > > On Tue, Jul 30, 2024 at 10:36:12AM +0100, Jon Hunter wrote: > > > > > > On 29/07/2024 11:47, Russell King (Oracle) wrote: > > > > > > ... > > > > > > > > Apologies for not following up before on this and now that is has been a > > > > > year I am not sure if it is even appropriate to dig this up as opposed to > > > > > starting a new thread completely. > > > > > > > > > > However, I want to resume this conversation because we have found that this > > > > > change does resolve a long-standing issue where we occasionally see our > > > > > ethernet controller fail to get an IP address. > > > > > > > > > > I understand that your objection to the above change is that (per Revanth's > > > > > feedback) this change assumes interface has the link. However, looking at > > > > > the aqr107_read_status() function where this change is made the function has > > > > > the following ... > > > > > > > > > > static int aqr107_read_status(struct phy_device *phydev) > > > > > { > > > > > int val, ret; > > > > > > > > > > ret = aqr_read_status(phydev); > > > > > if (ret) > > > > > return ret; > > > > > > > > > > if (!phydev->link || phydev->autoneg == AUTONEG_DISABLE) > > > > > return 0; > > > > > > > > > > > > > > > So my understanding is that if we don't have the link, then the above test > > > > > will return before we attempt to poll the TX ready status. If that is the > > > > > case, then would the change being proposed be OK? > > > > > > > > Here, phydev->link will be the _media_ side link. This is fine - if the > > > > media link is down, there's no point doing anything further. However, > > > > if the link is up, then we need the PHY to update phydev->interface > > > > _and_ report that the link was up (phydev->link is true). > > > > > > > > When that happens, the layers above (e.g. phylib, phylink, MAC driver) > > > > then know that the _media_ side interface has come up, and they also > > > > know the parameters that were negotiated. They also know what interface > > > > mode the PHY is wanting to use. > > > > > > > > At that point, the MAC driver can then reconfigure its PHY facing > > > > interface according to what the PHY is using. Until that point, there > > > > is a very real chance that the PHY <--> MAC connection will remain > > > > _down_. > > > > > > > > The patch adds up to a _two_ _second_ wait for the PHY <--> MAC > > > > connection to come up before aqr107_read_status() will return. This > > > > is total nonsense - because waiting here means that the MAC won't > > > > get the notification of which interface mode the PHY is expecting > > > > to use, therefore the MAC won't configure its PHY facing hardware > > > > for that interface mode, and therefore the PHY <--> MAC connection > > > > will _not_ _come_ _up_. > > > > > > > > You can not wait for the PHY <--> MAC connection to come up in the > > > > phylib read_status method. Ever. > > > > > > > > This is non-negotiable because it is just totally wrong to do this > > > > and leads to pointless two second delays. > > > > > > > > > Thanks for the feedback! We will go away, review this and see if we can > > > figure out a good/correct way to resolve our ethernet issue. > > > > Which ethernet driver is having a problem? > > > > It is the drivers/net/ethernet/stmicro/stmmac/dwmac-tegra.c driver. It works > most of the time, but on occasion it fails to get a valid IP address. Hmm. dwmac-tegra.c sets STMMAC_FLAG_SERDES_UP_AFTER_PHY_LINKUP, which means that the serdes won't be powered up until after the PHY has indicated that link is up. If the serdes is not powered up, then the MAC facing interface on the PHY won't come up. Hence, the code you're adding will, in all probability, merely add a two second delay to each and every time the PHY is polled for its status when the PHY indicates that the media link is up until such time that the stmmac side has processed that the link has come up. I also note that mgbe_uphy_lane_bringup_serdes_up() polls the link status on the MAC PCS side, waiting for the link to become ready on that side. So, what you have is: - you bring the interface up. The serdes interface remains powered down. - phylib starts polling the PHY. - the PHY indicates the media link is up. - your new code polls the PHY's MAC facing interface for link up, but because the serdes interface is powered down, it ends up timing out after two seconds and then proceeds. - phylib notifies phylink that the PHY has link. - phylink brings the PCS and MAC side(s) up, calling stmmac_mac_link_up(). - stmmac_mac_link_up() calls mgbe_uphy_lane_bringup_serdes_up() which then does receive lane calibration (which is likely the reason why this is delayed to link-up, so the PHY is giving a valid serdes stream for the calibration to use.) - mgbe_uphy_lane_bringup_serdes_up() enables the data path, and clears resets, and then waits for the serdes link with the PHY to come up. While stmmac_mac_link_up() is running, phylib will continue to try to poll the PHY for its status once every second, and each time it does if the PHY's MAC facing link reports that it's down, the phylib locks will be held for _two_ seconds each time. That will mean you won't be able to bring the interface down until those two seconds time out. So, I think one needs to go back and properly understand what is going on to figure out what is going wrong. You will likely find that inserting a two second delay at the start of mgbe_uphy_lane_bringup_serdes_up() is just as effective at solving the issue - although I am not suggesting that would be an acceptable solution. It would help to confirm that the reasoning is correct. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!