On 30.05.2020 23:43, Vladimir Oltean wrote: > From: Vladimir Oltean <vladimir.oltean@xxxxxxx> > > In kernel 4.19 (and probably earlier too) there are issues surrounding > the PHY_AN state. > > For example, if a PHY is in PHY_AN state and AN has not finished, then > what is supposed to happen is that the state machine gets rescheduled > until it is, or until the link_timeout reaches zero which triggers an > autoneg restart process. > > But actually the rescheduling never works if the PHY uses interrupts, > because the condition under which rescheduling occurs is just if > phy_polling_mode() is true. So basically, this whole rescheduling > functionality works for AN-not-yet-complete just by mistake. Let me > explain. > > Most of the time the AN process manages to finish by the time the > interrupt has triggered. One might say "that should always be the case, > otherwise the PHY wouldn't raise the interrupt, right?". > Well, some PHYs implement an .aneg_done method which allows them to tell > the state machine when the AN is really complete. > The AR8031/AR8033 driver (at803x.c) is one such example. Even when > copper autoneg completes, the driver still keeps the "aneg_done" > variable unset until in-band SGMII autoneg finishes too (there is no > interrupt for that). So we have the premises of a race condition. > That's not nice from the PHY: It signals "link up", and if the system asks the PHY for link details, then it sheepishly says "well, link is *almost* up". Question would be whether the same happens with other SGMII-capable PHY's so that we need to cater for this scenario in phylib. Or whether we consider it a chip quirk. In the latter case a custom read_status() handler might do the trick too: if link is reported as up then wait until aneg is signaled as done too before reading further link details. And it's interesting that nobody else stumbled across this problem before. I mean the PHY we talk about isn't really new. Or is your use case so special? > In practice, what really happens depends on the log level of the serial > console. If the log level is verbose enough that kernel messages related > to the Ethernet link state are printed to the console, then this gives > in-band AN enough time to complete, which means the link will come up > and everyone will be happy. But if the console is not that verbose, the > link will sometimes come up, and sometimes will be forced down by the > .aneg_done of the PHY driver (forever, since we are not rescheduling). > > The conclusion is that an extra condition needs to be explicitly added, > so that the state machine can be rescheduled properly. Otherwise PHY > devices in interrupt mode will never work properly if they have an > .aneg_done callback. > > In more recent kernels, the whole PHY_AN state was removed by Heiner > Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib > state machine" series here: > > https://patchwork.ozlabs.org/cover/994464/ > > and the problem was just masked away instead of being addressed with a > punctual patch. > > Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done") > Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx> > --- > I'm not sure the procedure I'm following is correct, sending this > directly to Greg. The patch doesn't apply on net. > > drivers/net/phy/phy.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c > index cc454b8c032c..ca4fd74fd2c8 100644 > --- a/drivers/net/phy/phy.c > +++ b/drivers/net/phy/phy.c > @@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work) > struct delayed_work *dwork = to_delayed_work(work); > struct phy_device *phydev = > container_of(dwork, struct phy_device, state_queue); > - bool needs_aneg = false, do_suspend = false; > + bool recheck = false, needs_aneg = false, do_suspend = false; > enum phy_state old_state; > int err = 0; > int old_link; > @@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work) > phy_link_up(phydev); > } else if (0 == phydev->link_timeout--) > needs_aneg = true; > + else > + recheck = true; > break; > case PHY_NOLINK: > if (!phy_polling_mode(phydev)) > @@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work) > * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving > * between states from phy_mac_interrupt() > */ > - if (phy_polling_mode(phydev)) > + if (phy_polling_mode(phydev) || recheck) > queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, > PHY_STATE_TIME * HZ); > } >