On Wed, Sep 01, 2021 at 02:18:04AM +0300, Vladimir Oltean wrote: > On Wed, Sep 01, 2021 at 01:02:09AM +0200, Andrew Lunn wrote: > > Rev B is interesting because switch0 and switch1 got genphy, while > > switch2 got the correct Marvell PHY driver. switch2 PHYs don't have > > interrupt properties, so don't loop back to their parent device. > > This is interesting and not what I really expected to happen. It goes to > show that we really need more time to understand all the subtleties of > device dependencies before jumping on patching stuff. There is an even more interesting variation which I would like to point out. It seems like a very odd loophole in the device links. Take the example of the mv88e6xxx DSA driver. On my board (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts), even after I had to declare the switches as interrupt controller and add interrupts to their internal PHYs, I still need considerable force to 'break' this board in the way discussed in this thread. The correct PHY driver insists to probe, and not genphy. Let me explain. The automatic device links between the switch (supplier, as interrupt-controller) and PHYs (consumers) are added by fwnode_link_add, called from of_link_to_phandle. Important note: fwnode_link_add does not link devices, it links OF nodes. Even more important node, in the form of a comment: * The driver core will use the fwnode link to create a device link between the * two device objects corresponding to @con and @sup when they are created. The * driver core will automatically delete the fwnode link between @con and @sup * after doing that. Okay?! What seems to be omitted is that the DSA switch driver's probing itself can be deferred. For example: dsa_register_switch -> dsa_switch_probe -> dsa_switch_parse_of -> dsa_switch_parse_ports_of -> dsa_port_parse_of -> of_find_net_device_by_node(of_parse_phandle(dn, "ethernet", 0)); -> not found => return -EPROBE_DEFER When dsa_register_switch() returns -EPROBE_DEFER, it is effectively an error path. So the reverse of initialization is performed. The mv88e6xxx driver calls mv88e6xxx_mdios_register() right _before_ dsa_register_switch. So when dsa_register_switch returns error code, mv88e6xxx_mdios_unregister() will be called. When mv88e6xxx_mdios_unregister() is called, the MDIO buses with internal PHYs are destroyed. So the PHY devices themselves are destroyed too. And the device links between the DSA switch and the internal PHYs, those created based on the firmware node links created by fwnode_link_add, are dropped too. Now remember the comment that the device links created based on fwnode_link_add are not restored. So probing of the DSA switch finally resumes, and this time device_links_check_suppliers() is effectively bypassed, the PHYs no longer request probe deferral due to their supplier not being ready, because the device link no longer exists. Isn't this self-sabotaging?! Now generally, DSA drivers defer probing because they probe in parallel with the DSA master. This is typical if the switch is on a SPI bus, or I2C, or on an MDIO bus provided by a _standalone_ MDIO controller. If the MDIO controller is not standalone, but is provided by Ethernet controller that is the DSA master itself, then things change a lot, because probing can never be parallel. The DSA master probes, initializes its MDIO bus, and this triggers the probing of the MDIO devices on that bus, one of which is the DSA switch. So DSA can no longer defer the probe due to that reason. Secondly, in DSA we even have variation between drivers as to where they register their internal MDIO buses. The mv88e6xxx driver does this in mv88e6xxx_probe (the probe function on the MDIO bus). The rtl8366rb driver calls realtek_smi_setup_mdio() from rtl8366rb_setup(), and this is important. DSA provides drivers with a .setup() callback, which is guaranteed to take place after nothing can defer the switch's probe anymore. So putting two and two together, sure enough, if I move mv88e6xxx_mdios_register from mv88e6xxx_probe to mv88e6xxx_setup, then I can reliably break this setup, because the device links framework isn't sabotaging itself anymore. Conversely, I am pretty sure that if rtl8366rb was to call of_mdiobus_register() from the probe method and not the setup method, the entire design issue with interrupts on internal DSA switch ports would have went absolutely unnoticed for a few more years. I have not tested this, but it also seems plausible that DSA can trivially and reliably bypass any fw_devlink=on restrictions by simply moving all of_mdiobus_register() driver calls from the .setup() method to their respective probe methods (prior to calling dsa_register_switch), then effectively fabricate an -EPROBE_DEFER during the first probe attempt. I mean, who will know whether that probe deferral request was justified or not? Anyway, I'm not sure everyone agrees with this type of "solution" (even though it's worth pointing it out as a fw_devlink limitation). In any case, we need some sort of lightweight "fix" to the chicken-and-egg problem, which will give me enough time to think of something better. I hope it is at least clearer now that there are subtleties and nuances, and we cannot just assess how many boards are broken by looking at the device trees. By design, all are, sure, but they might still work, and that's better than nothing...