On 28.11.2024 15:54, Andrew Lunn wrote: > On Thu, Nov 28, 2024 at 07:31:48AM +0100, Krzysztof Hałasa wrote: >> Andrew, >> >> Andrew Lunn <andrew@xxxxxxx> writes: >> >>>> Unfortunately it's initially set based on the supported capability >>>> rather than the actual hw setting. >>> >>> We need a clear definition of 'initially', and when does it actually >>> matter. >>> >>> Initially, things like speed, duplex and set to UNKNOWN. They don't >>> make any sense until the link is up. phydev->advertise is set to >>> phydev->supported, so that we advertise all the capabilities of the >>> PHY. However, at probe, this does not really matter, it is only when >>> phy_start() is called is the hardware actually configured with what it >>> should advertise, or even if it should do auto-neg or not. >>> >>> In the end, this might not matter. >> >> Nevertheless, it seems it does matter. >> >>>> While in most cases there is no >>>> difference (i.e., autoneg is supported and on by default), certain >>>> adapters (e.g. fiber optics) use fixed settings, configured in hardware. >>> >>> If the hardware is not capable of supporting autoneg, why is autoneg >>> in phydev->supported? To me, that is the real issue here. >> >> Well, autoneg *IS* supported by the PHY in this case. >> No autoneg in phydev->supported would mean I can't enable it if needed, >> wouldn't it? >> >> It is supported but initially disabled. >> >> With current code, PHY correctly connects to the other side, all the >> registers are valid etc., the PHY indicates, for example, a valid link >> with 100BASE-FX full duplex etc. >> >> Yet the Linux netdev, ethtool etc. indicate no valid link, autoneg on, >> and speed/duplex unknown. It's just completely inconsistent with the >> real hardware state. >> >> It seems the phy/phylink code assumes the PHY starts with autoneg >> enabled (if supported). This is simply an incorrect assumption. > > This is sounding like a driver bug. When phy_start() is called it > kicks off the PHY state machine. That should result in > phy_config_aneg() being called. That function is badly named, since it > is used both for autoneg and forced setting. The purpose of that call > is to configure the PHY to the configuration stored in > phydev->advertise, etc. So if the PHY by hardware defaults has autoneg > disabled, but the configuration in phydev says it should be enabled, > calling phy_config_aneg() should actually enabled autoneg. It is > possible there is a phylib bug here, because we try to not to kick off > autoneg if it is not needed, because it is slow. I've not looked at > the code, but it could be we see there is link, and skip calling > phy_config_aneg()? Maybe try booting with the cable disconnected so > there is no link? > If the PHY driver has no config_aneg() callback, then genphy_config_aneg() -> genphy_check_and_restart_aneg() would set BMCR_ANENABLE. Not sure about which PHY driver we're talking here, but if it has a custom config_aneg(), then setting BMCR_ANENABLE may be missing there. >> BTW if the code meant to enable autoneg, it would do exactly that - >> enable it by writing to PHY command register. > > Assuming bug free code. > >> Then the hw and sw state >> would be consistent again (though initial configuration would be >> ignored, not very nice). Now the code doesn't enable autoneg, it only >> *indicates* it's enabled and in reality it's not. > > I would say there are two different issues here. > > 1) It seems like we are not configuring the hardware to match phydev. > 2) We are overwriting how the bootloader etc configured the hardware. > > 2) is always hard, because how do we know the PHY is not messed up > from a previous boot/crash cycle etc. In general, a driver should try > to put the hardware into a well known state. If we have a clear use > case for this, we can consider how to implement it. > > Andrew >