On Wed, 29 Jan 2025 at 17:58, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > On Wed, Jan 29, 2025 at 5:42 PM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > > On Wed, 29 Jan 2025 at 16:55, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > > > On Wed, Jan 29, 2025 at 12:53 PM Ulf Hansson <ulf.hansson@xxxxxxxxxx> wrote: > > > > > > > > On Tue, 28 Jan 2025 at 20:24, Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote: > > > > > > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > > > > > > > > Commit 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the > > > > > resume phase") overlooked the case in which the parent of a device with > > > > > DPM_FLAG_SMART_SUSPEND set did not use that flag and could be runtime- > > > > > suspended before a transition into a system-wide sleep state. In that > > > > > case, if the child is resumed during the subsequent transition from > > > > > that state into the working state, its runtime PM status will be set to > > > > > RPM_ACTIVE, but the runtime PM status of the parent will not be updated > > > > > accordingly, even though the parent will be resumed too, because of the > > > > > dev_pm_skip_suspend() check in device_resume_noirq(). > > > > > > > > > > Address this problem by tracking the need to set the runtime PM status > > > > > to RPM_ACTIVE during system-wide resume transitions for devices with > > > > > DPM_FLAG_SMART_SUSPEND set and all of the devices depended on by them. > > > > > > > > > > Fixes: 6e176bf8d461 ("PM: sleep: core: Do not skip callbacks in the resume phase") > > > > > Closes: https://lore.kernel.org/linux-pm/Z30p2Etwf3F2AUvD@xxxxxxxxxxxxxxxxxxxx/ > > > > > Reported-by: Johan Hovold <johan@xxxxxxxxxx> > > > > > Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx> > > > > > --- > > > > > drivers/base/power/main.c | 29 ++++++++++++++++++++--------- > > > > > include/linux/pm.h | 1 + > > > > > 2 files changed, 21 insertions(+), 9 deletions(-) > > > > > > > > > > --- a/drivers/base/power/main.c > > > > > +++ b/drivers/base/power/main.c > > > > > @@ -656,13 +656,15 @@ > > > > > * so change its status accordingly. > > > > > * > > > > > * Otherwise, the device is going to be resumed, so set its PM-runtime > > > > > - * status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set > > > > > - * to avoid confusing drivers that don't use it. > > > > > + * status to "active" unless its power.set_active flag is clear, in > > > > > + * which case it is not necessary to update its PM-runtime status. > > > > > */ > > > > > - if (skip_resume) > > > > > + if (skip_resume) { > > > > > pm_runtime_set_suspended(dev); > > > > > - else if (dev_pm_skip_suspend(dev)) > > > > > + } else if (dev->power.set_active) { > > > > > pm_runtime_set_active(dev); > > > > > + dev->power.set_active = false; > > > > > + } > > > > > > > > > > if (dev->pm_domain) { > > > > > info = "noirq power domain "; > > > > > @@ -1189,18 +1191,24 @@ > > > > > return PMSG_ON; > > > > > } > > > > > > > > > > -static void dpm_superior_set_must_resume(struct device *dev) > > > > > +static void dpm_superior_set_must_resume(struct device *dev, bool set_active) > > > > > { > > > > > struct device_link *link; > > > > > int idx; > > > > > > > > > > - if (dev->parent) > > > > > + if (dev->parent) { > > > > > dev->parent->power.must_resume = true; > > > > > + if (set_active) > > > > > + dev->parent->power.set_active = true; > > > > > + } > > > > > > > > > > idx = device_links_read_lock(); > > > > > > > > > > - list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) > > > > > + list_for_each_entry_rcu_locked(link, &dev->links.suppliers, c_node) { > > > > > link->supplier->power.must_resume = true; > > > > > + if (set_active) > > > > > + link->supplier->power.set_active = true; > > > > > > > > If I understand correctly, the suppliers are already handled when the > > > > pm_runtime_set_active() is called for consumers, so the above should > > > > not be needed. > > > > > > It is needed because pm_runtime_set_active() doesn't cause the setting > > > to propagate to the parent's/suppliers of the suppliers AFAICS. > > > > Hmm, even if that sounds reasonable, I don't think it's a good idea as > > it may introduce interesting propagation problems between drivers. > > > > For example, consider that Saravana is trying to enable runtime PM for > > fw_devlinks. It would mean synchronization issues for the runtime PM > > status, all over the place. > > What synchronization issues? Changing the runtime PM status for a parent/supplier that doesn't have DPM_FLAG_SMART_SUSPEND, is likely to confuse their drivers. You also removed that part of the comment a few lines above, in device_resume_noirq(). I am not sure I understand why? > > > That said, is even consumer/suppliers part of the problem we are > > trying to solve? > > They are in general. > > It's just that stuff that was runtime-suspended prior to a system-wide > suspend may need to be resumed and marked as RPM_ACTIVE during > system-wide resume because one of the devices wants/needs to be > resumed then. > > If this turns out to be problematic, the problem will need to be > addressed, but for now I'm not seeing why there would be a problem. > > > > > > > > That said, maybe we instead allow parent/child to work in the similar > > > > way as for consumer/suppliers, when pm_runtime_set_active() is called > > > > for the child. In other words, when pm_runtime_set_active() is called > > > > for a child and the parent is runtime PM enabled, let's runtime resume > > > > it too, as we do for suppliers. Would that work, you think? > > > > > > The parent is not runtime-PM enabled when this happens. > > > > That sounds really weird to me. > > > > Does that mean that the parent has not been system resumed either? > > Yes. > > It hasn't been resumed yet, but it is known that it will be resumed. > > > If so, isn't that really the root cause for this problem, > > No, it is not. > > > or what am I missing? > > Essentially, what I said above. > > If a device that was suspended prior to a system-wide suspend > wants/needs to be resumed during the subsequent system-wide resume, > and it was runtime-PM-enabled before the suspend transition, it needs > to (a) be runtime-PM-enabled during the subsequent system-wide resume > transition and (b) it also needs to be marked as RPM_ACTIVE because in > fact it is not suspended any more. The existing code before the patch > takes care of this for the device itself, but not for the devices it > depends on which also need to be resumed (which happens) and marked as > RPM_ACTIVE (which doesn't happen) and the patch just makes sure that > the latter will happen. Thanks for clarifying! > > Actually, what happens now is that the actual state of the parent > during the system-wide resume, right before re-enabling runtime PM for > it, does not match its runtime PM status which is still RPM_SUSPENDED. > That's what is fixed here and it applies to the parent as well as to > all of the other devices depended on by the child and the parent. Well, unfortunately I don't think it will work to call pm_runtime_set_active() for parents/suppliers like this. I think we need the drivers for the parents/suppliers to be in agreement with the behaviour of DPM_FLAG_SMART_SUSPEND to allow the propagation. Not sure how to best achieve this though. Kind regards Uffe