On Fri, Feb 10, 2023 at 1:08 PM Vladimir Oltean <vladimir.oltean@xxxxxxx> wrote: > > On Fri, Feb 10, 2023 at 11:27:11AM -0800, Saravana Kannan wrote: > > On Fri, Feb 10, 2023 at 2:13 AM Vladimir Oltean <vladimir.oltean@xxxxxxx> wrote: > > > > > > Hi Saravana, > > > > > > On Mon, Feb 06, 2023 at 05:41:52PM -0800, Saravana Kannan wrote: > > > > Vladimir, > > > > > > > > Ccing you because DSA's and fw_devlink have known/existing problems > > > > (still in my TODOs to fix). But I want to make sure this series doesn't > > > > cause additional problems for DSA. > > > > > > > > All, > > > > > > > > This patch series improves fw_devlink in the following ways: > > > > > > > > 1. It no longer cares about a fwnode having a "compatible" property. It > > > > figures this out more dynamically. The only expectation is that > > > > fwnodes that are converted to devices actually get probed by a driver > > > > for the dependencies to be enforced correctly. > > > > > > > > 2. Finer grained dependency tracking. fw_devlink will now create device > > > > links from the consumer to the actual resource's device (if it has one, > > > > Eg: gpio_device) instead of the parent supplier device. This improves > > > > things like async suspend/resume ordering, potentially remove the need > > > > for frameworks to create device links, more parallelized async probing, > > > > and better sync_state() tracking. > > > > > > > > 3. Handle hardware/software quirks where a child firmware node gets > > > > populated as a device before its parent firmware node AND actually > > > > supplies a non-optional resource to the parent firmware node's > > > > device. > > > > > > > > 4. Way more robust at cycle handling (see patch for the insane cases). > > > > > > > > 5. Stops depending on OF_POPULATED to figure out some corner cases. > > > > > > > > 6. Simplifies the work that needs to be done by the firmware specific > > > > code. > > > > > > > > The v3 series has gone through my usual testing on my end and looks good > > > > to me. > > > > > > Booted on an NXP LS1028A (arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts) > > > and a Turris MOX (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts) > > > with no observed regressions. > > > > Thanks for testing Vladimir! > > > > > Is there something specific you would like > > > me to test? > > > > Not really, I just want to make sure the common DSA architectures > > don't hit any regression. In the hardware you tested, are there cases > > of PHYs where the supplier is the parent MDIO? I remember that being > > the only case where I needed special casing > > (FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD) in fw_devlink -- so it'll be > > good to make sure I didn't accidentally break anything there. > > > > -Saravana > > Yes and no (I never had a system which depended on FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD). > > Yes, because well, yes, in arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts, > the PHYs will depend on interrupts provided by their (parent) switch. However this > is not explicit in the device tree. To make it explicit, one would need to add: > > diff --git a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts > index cd0988317623..d789cda49e35 100644 > --- a/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts > +++ b/arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts -----8<---- Snipped DT diff ----- > However, as I had explained in one of the first discussions here: > https://lore.kernel.org/netdev/20210901012826.iuy2bhvkrgahhrl7@skbuf/ > > it was always hit-or-miss whether the above device tree had an issue > with fw_devlink or not: it depended on how the driver was written (and > the mv88e6xxx switch driver was tricking the fw_devlink logic from that > time to drop the device links because of an unrelated -EPROBE_DEFER). Yeah, I never forgot this issue. That's why I used "additional" in my cover letter :) So far I've not needed to change fw_devlink in a way that'd break this unintentional "tricky behavior" but I might be coming up to that wall soon. So this reply is becoming more relevant to me: https://lore.kernel.org/lkml/CAGETcx8De_qm9hVtK5CznfWke9nmOfV8OcvAW6kmwyeb7APr=g@xxxxxxxxxxxxxx/ Not sure if you've had a chance to read or think about it. > What I had done to "untrick" fw_devlink so that I could see the issue > (which was originally reported by Alvin Šipraga) was to modify the > mv88e6xxx driver, and change the placement of mv88e6xxx_mdios_register() > to a point after which we will never hit -EPROBE_DEFER (from driver probe() > to the dsa_switch_ops :: setup() method): > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c > index 0a5d6c7bb128..48650465660d 100644 > --- a/drivers/net/dsa/mv88e6xxx/chip.c > +++ b/drivers/net/dsa/mv88e6xxx/chip.c -----8<---- snipped > After applying both of the above changes on top of yours, I confirm that > the PHYs on the mv88e6xxx on Turris MOX still probe with their specific > PHY driver rather than the generic one, and with interrupts (not poll mode): > -----8<---- snipped > > even though I am seeing these error messages earlier in the boot process (maybe this is something to look into): > > [ 0.910219] mdio_bus d0032004.mdio-mii:10: Failed to create device link with d0032004.mdio-mii:10 -----8<---- snipped > [ 0.943879] mv88e6085 d0032004.mdio-mii:12: switch 0x1900 detected: Marvell 88E6190, revision 1 > > > If _on top_ of all the above, I also remove the logic that sets FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD: > then *finally* I get something approximating Alvin's reported issue. > In my case, one switch out of 3 gets its PHYs bound to the Generic PHY > driver (why not all is a story for another time): -----8<---- snipped > So I guess that FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD does something. Thanks for the extensive effort into testing this! -Saravana