On Wed, Mar 19, 2025 at 03:58:14PM +0000, Russell King (Oracle) wrote: > On Wed, Mar 19, 2025 at 12:58:39AM +0100, Christian Marangi wrote: > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > > index 7f71547e89fe..c6d9e4efed13 100644 > > --- a/drivers/net/phy/phylink.c > > +++ b/drivers/net/phy/phylink.c > > @@ -1395,6 +1395,15 @@ static void phylink_major_config(struct phylink *pl, bool restart, > > if (pl->mac_ops->mac_select_pcs) { > > pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface); > > if (IS_ERR(pcs)) { > > + /* PCS can be removed unexpectedly and not available > > + * anymore. > > + * PCS provider will return probe defer as the PCS > > + * can't be found in the global provider list. > > + * In such case, return -ENOENT as a more symbolic name > > + * for the error message. > > + */ > > + if (PTR_ERR(pcs) == -EPROBE_DEFER) > > + pcs = ERR_PTR(-ENOENT); > > I don't particularly like the idea of returning -EPROBE_DEFER from > mac_select_pcs()... there is no way *ever* that such an error code > could be handled. > Maybe this wasn't clear enough, the idea here is that at major_config under normal situation this case should never happen unless the driver was removed. In such case the PCS provider returns a EPROBE_DEFER that in this case is assumed driver not present anymore. Hence phylink fails to apply the configuration similar to the other fail case in the same function. The principle here is not "we need to wait for PCS" but react on the fact that it was removed in the meantime. (something that should not happen as the PCS driver is expected to dev_close the interface) > > linkmode_fill(pl->supported); > > linkmode_copy(pl->link_config.advertising, pl->supported); > > - phylink_validate(pl, pl->supported, &pl->link_config); > > + ret = phylink_validate(pl, pl->supported, &pl->link_config); > > + /* The PCS might not available at the time phylink_create > > + * is called. Check this and communicate to the MAC driver > > + * that probe should be retried later. > > + * > > + * Notice that this can only happen in probe stage and PCS > > + * is expected to be avaialble in phylink_major_config. > > + */ > > + if (ret == -EPROBE_DEFER) { > > + kfree(pl); > > + return ERR_PTR(ret); > > + } > > This does not solve the problem - what if the interface mode is > currently not one that requires a PCS that may not yet be probed? Mhhh but what are the actual real world scenario for this? If a MAC needs a dedicated PCS to handle multiple mode then it will probably follow this new implementation and register as a provider. An option to handle your corner case might be an OP that wait for each supported interface by the MAC and make sure there is a possible PCS for it. And Ideally place it in the codeflow of validate_pcs ? > > I don't like the idea that mac_select_pcs() might be doing a complex > lookup - that could make scanning the interface modes (as > phylink_validate_mask() does) quite slow and unreliable, and phylink > currently assumes that a PCS that is validated as present will remain > present. The assumption "will remain present" is already very fragile with the current PCS so I feel this should be changed or improved. Honestly every PCS currently implemented can be removed and phylink will stay in an undefined state. Also the complex lookup in 99% of the time is really checking one/2 max PCS for a single interface and we are really checking a list and a bitmap, nothing fancy that might introduce delay waiting for something. > > If it goes away by the time phylink_major_config() is called, then we > leave the phylink state no longer reflecting how the hardware is > programmed, but we still continue to call mac_link_up() - which should > probably be fixed. Again, the idea to prevent these kind of chicken-egg problem is to enforce correct removal on the PCS driver side. > > Given that netdev is severely backlogged, I'm not inclined to add to > the netdev maintainers workloads by trying to fix this until after > the merge window - it looks like they're at least one week behind. > Consequently, I'm expecting that most patches that have been > submitted during this week will be dropped from patchwork, which > means submitting patches this week is likely not useful. > Ok I will send next revision as RFC to not increase the "load" but IMHO it's worth to discuss this... I really feel we need to fix the PCS situation ASAP or more driver will come. (there are already 3 in queue as stressed in the cover letter) -- Ansuel