On Thu, Aug 26, 2021 at 6:23 PM Andrew Lunn <andrew@xxxxxxx> wrote: > > > Doesn't add much to the discussion. In the example I gave, the driver > > already does synchronous probing. If the device can't probe > > successfully because a supplier isn't ready, it doesn't matter if it's > > a synchronous probe. The probe would still be deferred and we'll hit > > the same issue. Even in the situation the commit [5] describes, if > > parallelized probing is done and the PHY depended on something (say a > > clock), you'd still end up not probing the PHY even if the driver is > > present and the generic PHY would end up force probing it. > > > genphy is meant to be used when there is no other driver available. > It is a best effort, better than nothing, might work. And quite a few > boards rely on it. However, it should not be used when there is a > specific driver. Agreed, that's what we are trying to ensure. > So if the PHY device has been probed, and -EPROBE_DEFER was returned, > we also need to return -EPROBE_DEFER here when deciding if genphy > should be used. It should then all unwind and try again later. Yes, I think dsa_register_switch() returning -EPROBE_DEFER if the PHYs returned -EPROBE_DEFER might be okay (I think we should do it), but that doesn't solve the problem for this driver. fw_devlink=on/device links short circuits the probe() call of a consumer (in this case the PHY) and returns -EPROBE_DEFER if the supplier's (in this case switch) probe hasn't finished without an error. fw_devlink/device links effectively does the probe in graph topological order and there's a ton of good reasons to do it that way -- what's why fw_devlink=on was implemented. In this specific case though, since the PHY depends on the parent device, if we fail the parent's probe realtek_smi_probe() because the PHYs failed to probe, we'll get into a catch-22/chicken-n-egg situation and the switch/PHYs will never probe. I think a clean way to fix this at the driver level is to do what I said in [6]. Copy pasting it here and expanding it a bit: 1. The IRQ registration and mdio bus registration should get moved to realtek_smi_probe() which probes "realtek,rtl8366rb". So realtek_smi_probe() succeeding doesn't depend on its child devices probing successfully (which makes sense for any parent device). 2. realtek_smi_probe() should also create/register a component-master/aggregate device that's "made up of" realtek,rtl8366rb and all the PHYs. So the component-master will wait for all of them to finish probing before it's initialized. 3. PHYs will probe successfully now because realtek,rtl8366rb probe() which is the supplier's probe has finished probing without problems. 4. The component device's init (the .bind op) would call dsa_register_switch() which kinda makes sense because the rtl8366rb and all the PHYs combined together is what makes up the logical DSA switch. The dsa_register_switch() will succeed and will be using the right/specific PHY driver. The same applies for any switch that has the PHYs as it's child device AND (this is the key part) the PHYs depend on the switch as a supplier (remember, if we didn't have the interrupt dependency, this would not be an issue). > I don't know the device core, but it looks like dev->can_match tells > us what we need to know. If true, we know there is a driver for this > device. But i'm hesitant to make use of this outside of driver/base. can_match is never cleared once it's set and it's meant as an optimization/preserving some probe order stuff. I wouldn't depend on it for this case. We can just have a phy_has_driver() function that searches all the currently registered PHY drivers to check if there's a matching driver. And dsa_register_switch() or phy_attach_direct() can return -EPROBE_DEFER if there is a driver but it isn't bound yet. Again, this is orthogonal to the realtek driver fix though because of the catch-22 situation above. -Saravana [6] - https://lore.kernel.org/netdev/CAGETcx8_vxxPxF8WrXqk=PZYfEggsozP+z9KyOu5C2bEW0VW8g@xxxxxxxxxxxxxx/