> On Wed, Nov 13, 2024 at 02:12:36AM +0000, Tristram.Ha@xxxxxxxxxxxxx wrote: > > When the SFP says it supports 1000Base-T sfp_add_phy() is called by the > > SFP state machine and phylink_sfp_connect_phy() and > > phylink_sfp_config_phy() are run. It is in the last function that the > > validation fails as the just created phy device does not initialize its > > supported and advertising fields yet. The phy device has the > > opportunity later to fill them up if the phylink creation goes through, > > but that never happens. > > > > A fix is to fill those fields with sfp_support like this: > > > > @@ -3228,6 +3228,11 @@ static int phylink_sfp_config_phy(struct > > struct phylink_link_state config; > > int ret; > > > > + /* The newly created PHY device has empty settings. */ > > + if (linkmode_empty(phy->supported)) { > > + linkmode_copy(phy->supported, pl->sfp_support); > > + linkmode_copy(phy->advertising, pl->sfp_support); > > + } > > linkmode_copy(support, phy->supported); > > > > memset(&config, 0, sizeof(config)); > > > > The provided PCS driver from the DSA driver has an opportunity to change > > support with its validation check, but that does not look right as > > generally those checks remove certain bits from the link mode, but this > > requires completely copying new ones. And this still does not work as > > the advertising field passed to the PCS driver has a const modifier. > > I think I know what's happening, it's unfortunate it pushed you towards > wrong conclusions. > > The "fix" you posted is wrong, and no, the PCS driver should not expand > the supported mask, just restrict it as you said. The phydev->supported > mask normally comes from the phy_probe() logic: > > /* Start out supporting everything. Eventually, > * a controller will attach, and may modify one > * or both of these values > */ > if (phydrv->features) { > linkmode_copy(phydev->supported, phydrv->features); > genphy_c45_read_eee_abilities(phydev); > } > else if (phydrv->get_features) > err = phydrv->get_features(phydev); > else if (phydev->is_c45) > err = genphy_c45_pma_read_abilities(phydev); > else > err = genphy_read_abilities(phydev); > > The SFP bus code depends strictly on sfp_sm_probe_phy() -> phy_device_register() > actually loading a precise device driver for the PHY synchronously via > phy_bus_match(). There is another lazy loading mechanism later in > phy_attach_direct(), for the Generic PHY driver: > > /* Assume that if there is no driver, that it doesn't > * exist, and we should use the genphy driver. > */ > > but that is too late for this code path, because as you say, > phylink_sfp_config_phy() is coded up to only call phylink_attach_phy() > if phylink_validate() succeeds. But phylink_validate() will only see a > valid phydev->supported mask with the Generic PHY driver if we let that > driver attach in phylink_attach_phy() in the first place. > > Personally, I think SFP modules with embedded PHYs strictly require the > matching driver to be available to the kernel, due to that odd way in > which the Generic PHY driver is loaded, but I will let the PHY library > experts share their opinion as well. > > You would be better off improving the error message, see what PHY ID you > get, then find and load the driver for it: > > diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c > index 7dbcbf0a4ee2..8be473a7d262 100644 > --- a/drivers/net/phy/sfp.c > +++ b/drivers/net/phy/sfp.c > @@ -1817,9 +1817,12 @@ static int sfp_sm_probe_phy(struct sfp *sfp, int addr, bool > is_c45) > > err = sfp_add_phy(sfp->sfp_bus, phy); > if (err) { > + dev_err(sfp->dev, > + "sfp_add_phy() for PHY %s (ID 0x%.8lx) failed: %pe, maybe PHY driver > not loaded?\n", > + phydev_name(phy), (unsigned long)phy->phy_id, > + ERR_PTR(err)); > phy_device_remove(phy); > phy_device_free(phy); > - dev_err(sfp->dev, "sfp_add_phy failed: %pe\n", ERR_PTR(err)); > return err; > } > > > Chances are it's one of CONFIG_MARVELL_PHY or CONFIG_AQUANTIA_PHY. Indeed adding the Marvell PHY driver fixed the problem. There is nothing special about the Marvell PHY driver. It is just phy_probe() is called during PHY device creation just as you said. It may not be right to use sfp_support, but all three (sfp_support, supported, advertising) have about the same value at that point after the PHY driver is invoked: 0x62ff and 0x60f0. I mentioned before that some SFPs have faulty implementation where part of the returned PHY register value is 0xff. For example, PHY register 0 is 0x11ff, PHY register 1 is 0x79ff, and PHY register 2 is 0x01ff. The Marvell PHY id is 0x01410cc0, and I saw there is a special PHY id 0x01ff0cc0 defined for Finisar to accommodate this situation. Were those SFPs made by Finisar originally? Some of those PHY registers are correct, so I do not know if those wrong registers are intentional or not, but the link status register always has 0xff value and that cannot be right.