On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote: > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c > index 52310df121de..2c22a32f0a1c 100644 > --- a/drivers/net/phy/phy_device.c > +++ b/drivers/net/phy/phy_device.c > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, > > /* Assume that if there is no driver, that it doesn't > * exist, and we should use the genphy driver. > + * The exception is during probing, when the PHY driver might have > + * attempted a probe but has requested deferral. Since there might be > + * MAC drivers which also attach to the PHY during probe time, try > + * harder to bind the specific PHY driver, and defer the MAC driver's > + * probing until then. > */ > if (!d->driver) { > + if (device_pending_probe(d)) > + return -EPROBE_DEFER; Something else that concerns me here. As noted, many network drivers attempt to attach their PHY when the device is brought up, and not during their probe function. Taking a driver at random: drivers/net/ethernet/renesas/sh_eth.c sh_eth_phy_init() calls of_phy_connect() or phy_connect(), which ultimately calls phy_attach_direct() and propagates the error code via an error pointer. sh_eth_phy_init() propagates the error code to its caller, sh_eth_phy_start(). This is called from sh_eth_open(), which probagates the error code. This is called from .ndo_open... and it's highly likely -EPROBE_DEFER will end up being returned to userspace through either netlink or netdev ioctls. Since EPROBE_DEFER is not an error number that we export to userspace, this should basically never be exposed to userspace, yet we have a path that it _could_ be exposed if the above condition is true. If device_pending_probe() returns true e.g. during initial boot up while modules are being loaded - maybe the phy driver doesn't have all the resources it needs because of some other module that hasn't finished initialising - then we have a window where this will be exposed to userspace. So, do we need to fix all the network drivers to do something if their .ndo_open method encounters this? If so, what? Sleep a bit and try again? How many times to retry? Convert the error code into something else, causing userspace to fail where it worked before? If so which error code? I think this needs to be thought through a bit better. In this case, I feel that throwing -EPROBE_DEFER to solve one problem with one subsystem can result in new problems elsewhere. We did have an idea at one point about reserving some flag bits in phydev->dev_flags for phylib use, but I don't think that happened. If this is the direction we want to go, I think we need to have a flag in dev_flags so that callers opt-in to the new behaviour whereas callers such as from .ndo_open keep the old behaviour - because they just aren't setup to handle an -EPROBE_DEFER return from these functions. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!