On Wed, 1 Sept 2021 at 23:49, Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > On Wed, Sep 1, 2021 at 1:13 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > > On Tue, 31 Aug 2021 at 19:51, Saravana Kannan <saravanak@xxxxxxxxxx> wrote: > > > > > > On Tue, Aug 31, 2021 at 3:21 AM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > > > > > > The power-domain DT bindings [1] doesn't enforce a compatible string for a > > > > provider node, even if this is common to use. In particular, when > > > > describing a hierarchy with parent/child power-domains, as the psci DT > > > > bindings [2] for example, it's sometimes not applicable to use a compatible > > > > string. > > > > > > Ok, and fw_devlink handles that -- provider not having a compatible > > > string is pretty common. In these cases, the parent node is the actual > > > device that gets probed and registers the provider. So fw_devlink will > > > create a link from the consumer to the parent device node. > > > > Yes, correct. That is working fine and isn't a problem. > > > > The first problem (I think) is that fw_devlink creates a fw_devlink > > from a child provider node (consumer without compatible string) to a > > parent node (supplier with a compatible string). I don't understand > > the reason why this is needed, perhaps you can elaborate on why? > > I really should get around to doing a LWN article on this and also > putting some of it into Documentation. Btw most of this info would be > in one of my earlier commit texts or in the code comments. But it's > still handy to have them all in one place I suppose. > > I'll try to answer all your questions with this example. Don't take > the actual properties too literally, they are just used to point out > dependencies. > > Device-A { > compatible="foo"; > > Device-B { > compatible="flam"; > power-domains = <&Device-C>; > } > } > > Device-C { > compatible="bar"; > > Device-D { > compatible="baz"; > power-domains = <&Device-A>; > } > } > > Legend: > I'll use X -> Y to indicate links where X is the consumer and Y is the supplier. > I'll call out the link types as fwnode or device links. > If I don't explicitly state otherwise, when I say device links, I mean > stateful/managed device link that is NOT sync-state-only device links. > > I think your first question is asking about fwnode link. So I'll answer that. > > fwnode links are created from the actual nodes that list the > dependencies. So in this case from device-B -> device-C and device-D > -> device-A. It needs to be done this way for a couple of reasons. But > to answer your question on "why do this when Device-B doesn't have a > compatible string?": > > 1. Not all devices have compatible strings (in an ideal world this > won't be the case). So Device-A would create a struct device for > Device-B, set the of_node/fwnode to point to Device-B DT node. Then > device-B gets probed, etc. In those cases, we want the device links to > be created between device-B -> device-C and NOT from device-A -> > device-C. Because if we did follow that logic, we'll get device-A -> > device-C and device-C -> device-A. This obviously can't work because > it's a cyclic dependency and fw_devlink will have to give up on these. > > 2. When device-C is added (assuming device-A is added already), we > need to create a sync-state-only device link from device-A to device-C > as a proxy for the future device-B -> device-C device link that'll > come up. This is to make sure device-C's sync_state() doesn't fire too > early. So the way fw_devlink can tell apart device-A's real dependency > (none in this case) vs device-B's dependency it's proxying for is by > the fact the fwnode link is from device-B DT -> device-C DT. > > Hope that makes sense. Yes, it does and I understand that it may become complicated in some cases. If you get the time to put together an LWN article about fw_devlinks, I would certainly read it. :-) However, at least for power-domains, the DT example you describe above is an invalid description of a HW. It doesn't make sense to try to support if for fw_devlink, at least in my opinion. Let me elaborate. So, I assume you have left out the #power-domain-cells property (for simplicity) for Device-A and Device-C, as those seem to be the power-domain providers. *) If Device-B is a consumer of Device-C, it also means that Device-A must be assigned as the child-power-domain to Device-C's power-domain. **) If Device-D is the consumer of Device-A, it also means that Device-C must be assigned as the child-power-domain to Device-A's power-domain. This simply can't be right from the HW point of view - and we don't support this in the Linux kernel anyway. A power-domain can not be both parent and child to another power-domain. In other words, cyclic dependencies can't exist for power-domains, as it would be a wrong description of the HW. I wonder if the similar reasoning is applicable for other types of resources, like clocks and regulators, for example. > > > > > I come to the second and follow up problem from this behaviour, see below. > > > > > > > > > Therefore, let's set the 'optional_con_dev' to true to avoid creating > > > > incorrect fw_devlinks for power-domains. > > > > > > This part doesn't make sense or is incomplete. What is being done incorrectly? > > > > See below. > > > > > > > > > > > > > [1] Documentation/devicetree/bindings/power/power-domain.yaml > > > > [2] Documentation/devicetree/bindings/arm/psci.yaml > > > > > > > > Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx> > > > > --- > > > > > > > > Some more details of what goes on here. I have added a debug print in > > > > of_link_to_phandle() to see the fw_devlinks that gets created. > > > > > > > > This is what happens on Dragonboard 410c when 'optional_con_dev' isn't set: > > > > ... > > > > [ 0.041274] device: 'psci': device_add > > > > [ 0.041366] OF: Linking power-domain-cpu0 (consumer) to psci (supplier) > > > > [ 0.041395] OF: Linking power-domain-cpu1 (consumer) to psci (supplier) > > > > [ 0.041423] OF: Linking power-domain-cpu2 (consumer) to psci (supplier) > > > > [ 0.041451] OF: Linking power-domain-cpu3 (consumer) to psci (supplier) > > > > [ 0.041494] device: 'platform:psci--platform:psci': device_add > > > > [ 0.041556] platform psci: Linked as a sync state only consumer to psci > > > > Because we created a fw_devlink for the child provider nodes > > (consumer) that lacks compatible properties, we end up creating a sync > > state only devlink. I don't think it serves a purpose, but I may be > > wrong!? > > sync-state-only device links serve the purpose I explained in point 2 above. > > > > > Additionally, the actual devlink that is created, has the same > > supplier and consumer device, which indicates that this isn't the > > right thing to do. > > THIS is the part that's kinda wrong. But it doesn't really break > anything. It would also be very short lived as it would get deleted as > soon as the consumer (in this case the same as supplier) ends up > probing. But in your case the psci DT node doesn't use driver core for > probing because it could be an early boot dependency. In which case > you should really set the OF_POPULATED flag so a pointless struct > device isn't created for the DT node or at least set > FWNODE_FLAG_NOT_DEVICE so that fw_devlink knows not to wait on this > supplier. This is good practice for fw_devlink (not just to work > around the psci -> psci device link). Can you put up this patch > please? > > Also sync-state-only links are explicitly present to allow cyclic > dependencies (in the example above sync-state-only links will exist as > device-A -> device-C and device-C -> device-A), but it kinda pointless > to have a link where the supplier and the consumer are the same. > > With that said, if you want to avoid this short-lived pointless device > link, I'd say the fix should be in device_link_add() in the following > check: > > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -705,6 +705,7 @@ struct device_link *device_link_add(struct device *consumer, > * because it only affects sync_state() callbacks. > */ > if (!device_pm_initialized(supplier) > + || consumer == supplier > || (!(flags & DL_FLAG_SYNC_STATE_ONLY) && > device_is_dependent(consumer, supplier))) { > link = NULL; This change seems like the right thing to do, no matter what. I will send a formal patch for it, thanks! > > > > > > > ... > > > > > > > > This is what happens on Dragonboard 410c when 'optional_con_dev' is set: > > > > ... > > > > [ 0.041179] device: 'psci': device_add > > > > [ 0.041265] OF: Not linking psci to psci - is descendant > > > > [ 0.041293] OF: Not linking psci to psci - is descendant > > > > [ 0.041319] OF: Not linking psci to psci - is descendant > > > > [ 0.041346] OF: Not linking psci to psci - is descendant > > > > ... > > > > > > Can you please explain what exactly is going on that's wrong here? I > > > notice that psci is not probed as a device at all. And when you aren't > > > setting this flag the only difference I see is the creating of a sync > > > state only link -- which shouldn't matter here because you don't even > > > have a driver implemented. > > > > See above. > > > > > > > > > The relevant dtsi file: > > > > arch/arm64/boot/dts/qcom/msm8916.dtsi > > > > > > > > Kind regards > > > > Ulf Hansson > > > > > > > > --- > > > > drivers/of/property.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/of/property.c b/drivers/of/property.c > > > > index 2babb1807228..4d607fdbea24 100644 > > > > --- a/drivers/of/property.c > > > > +++ b/drivers/of/property.c > > > > @@ -1356,7 +1356,7 @@ static const struct supplier_bindings of_supplier_bindings[] = { > > > > { .parse_prop = parse_io_channels, }, > > > > { .parse_prop = parse_interrupt_parent, }, > > > > { .parse_prop = parse_dmas, .optional = true, }, > > > > - { .parse_prop = parse_power_domains, }, > > > > + { .parse_prop = parse_power_domains, .optional_con_dev = true, }, > > > > > > This change is just shooting in dark/completely unrelated to the > > > commit text. This is just saying the actual consumer is a level up > > > from where the property is listed (eg: remote-endpoint). It just > > > happens to fix your case for unrelated reasons. > > > > Again, see above. > > > > > > > > Definite Nak as this *will* break other cases. > > > > In what way will this break other cases? > > > > Would you mind elaborating for my understanding and perhaps point me > > to an example where it will break? > > So if you did this, it'll break: > (1) the probe of device-A/device-C due to cyclic dependencies. Really > no, because fw_devlink will just stop enforcing ordering between > device-A and device-C if it detects a cycle. But if there was a real > dependency (can me multiple links deep) between device-A -> device-C, > that would no longer get enforced. As I said above, cyclic dependencies don't exist for power-domains. > (2) It'd break sync_state() correctness for device-B -> device-C dependency. I don't see that. Again, because power-domain providers can't be described in a cyclic way in DT. > > Hope that helps. > Perhaps, renaming the flag to "non-cyclic" would be an option? As it seems like that is what this boils done to, right? > -Saravana Thanks a lot for your reply and for taking the time to explain things! Kind regards Uffe