On Sunday, March 29, 2020 6:27:38 PM CEST Alan Stern wrote: > On Sun, 29 Mar 2020, Rafael J. Wysocki wrote: > > > On Sat, Mar 28, 2020 at 8:58 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > A large part of the problem is related to an inconsistency between the > > > > > documentation and the code. include/linux/pm.h says that > > > > > DPM_FLAG_SMART_SUSPEND tells bus types and PM domains about what the > > > > > driver wants. This strongly implies that the PM core will ignore > > > > > SMART_SUSPEND. But in fact the core does check that flag and takes its > > > > > own actions if the device has no subsystem-level callbacks! > > > > > > > > Right, which is because in those cases there is no "middle layer" between > > > > the driver and the core and if you want the driver to work both with > > > > something like genpd or the ACPI PM domain and without anything like that, > > > > the core needs to take those actions for consistency. > > > > > > Sure, the core is acting as a proxy for the missing subsystem > > > callbacks. Still, it should be documented properly. > > > > > > Also, couldn't we simplify the behavior? Right now the core checks > > > that there are no subsystem-level callbacks for any of _early, _late, > > > and _noirq variants before skipping a callback. Couldn't we forget > > > about all that checking and simply skip the device-level callbacks? > > > (Yes, I realize this could lead to inconsistent behavior if the > > > subsystem has some callbacks defined but not others -- but subsystems > > > should never do that, at least, not if it would lead to trouble.) > > > > In quite a few cases the middle layer has nothing specific to do in a > > given phase of suspend/resume, but the driver may. > > > > Subsystems haven't been required to provide callbacks for all phases > > so far, so this change would require some modifications in there. > > > > I actually prefer the core to do more, even if that means more > > complexity in it, to avoid possible subtle differences in behavior > > between subsystems. > > What I meant was that it might be reasonable to get rid of the > no_subsys_cb checks. For example, change __device_suspend_noirq() as > follows: > > - no_subsys_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL); > - > - if (dev_pm_smart_suspend_and_suspended(dev) && no_subsys_cb) > + if (dev_pm_smart_suspend_and_suspended(dev)) > goto Skip; > > with similar changes to the _suspend_late, _resume_noirq, and > _resume_early. In each stage, we would bypass the driver's callback > if SMART_SUSPEND was set and there was no subsystem-level callback for > _that_ stage -- rather than there being no subsystem-level callbacks > for _any_ of the stages. I understand that. As mentioned in the other message, I attempted to allow pm_runtime_force_suspend/resume() to be used along with setting SMART_SUSPEND, but that looks like a mistake now. I agree that skipping the driver-level callbacks regardless of what is provided by the subsystem would be more consistent. > > > > > Furthermore, the PM core's actions don't seem to make sense. If the > > > > > flag is set and the device is runtime-suspended when the system sleep > > > > > begins, the core will skip issuing the suspend_late and suspend_noirq > > > > > callbacks to the driver. But it doesn't skip issuing the suspend > > > > > callback! I can't figure that out. > > > > > > > > That's because if the core gets to executing ->suspend_late, PM-runtime has > > > > been disabled for the device and if the device is runtime-suspended at that > > > > point, so (at least if SMART_SUSPEND is set for the device) there is no reason > > > > to do anything more to it. > > > > > > But if SMART_SUSPEND is set and the device is runtime-suspended, why > > > issue the ->suspend callback? > > > > The driver itself or the middle-layer may want to resume the device. > > > > Arguably, it may do that in ->prepare() too, but see below. > > > > > Why not just do pm_runtime_disable() > > > then (to prevent the device from resuming) and skip the callback? > > > > Because another driver may want to runtime-resume that device in order > > to use it for something before ->suspend_late(). Of course, you may > > argue that this means a missing device link or similar, so it is not > > clear-cut. > > > > The general rule is that "synchronous" PM-runtime can be expected to > > work before ->suspend_late(), so ->suspend() callbacks should be able > > to use it safely in all cases in principle. > > (With one exception: Since the PM core does pm_runtime_get_noresume() > during the prepare stage, going _into_ runtime suspend is impossible > during ->prepare and ->suspend. Of course, drivers can always call > their runtime_suspend routines directly, but that wouldn't affect the > power.runtime_status value.) > > > That expectation goes against direct_complete in some cases, so > > drivers need to set NEVER_SKIP (or whatever it will be called in the > > future) to avoid that problem. > > Ah, okay. Very well, let's spell this out explicitly in the > documentation; it's an important difference. > > > > > > Furthermore, the decisions about > > > > > whether to skip the resume_noirq, resume_early, and resume callbacks > > > > > are based on different criteria from the decisions on the suspend side. > > > > > > > > Right, because there are drivers that don't want devices to stay in suspend > > > > after system resume even though they have been left in suspend by it. > > > > > > This suggests that sometimes we may want to issue non-matching > > > callbacks. For example, ->resume_noirq, ->resume_early, and ->resume > > > but not ->suspend, ->suspend_late, or ->suspend_noirq. Is that what > > > you are saying? > > > > Yes. > > > > As per devices.rst: > > > > "the driver must be prepared to > > cope with the invocation of its system-wide resume callbacks back-to-back with > > its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and > > so on) and the final state of the device must reflect the "active" runtime PM > > status in that case." > > Here would also be a good place to mention the difference between "keep > the device in runtime suspend" and "not issue the various resume > callbacks". In theory, a subsystem and a driver could collaborate to > make their resume-side callbacks keep the device runtime-suspended, so > these two concepts are not identical. > > Alternatively, we could specify that this sort of thing is never > allowed: When the ->resume callback returns, the device _must_ be > powered-up and runtime-active. If we do this, then the _only_ way to > avoid powering up the device (and to let it remain in runtime suspend) > is for the core to skip issuing the resume-side callbacks. Or at > least, skip issuing the ->resume callback. Basically, all devices with SMART_SUSPEND set whose late/noirq suspend callbacks were skipped can be left in suspend during system-wide resume by skipping their callbacks, so that they can be resumed by PM-runtime (that becomes kind of like direct-complete at that point), but some drivers may not want that for earlier device response after system-wide resume (if it is resumed by the system-wide code, it will be immediately ready when user space is unfrozen). It is expected that LEAVE_SUSPENDED will be used along with SMART_SUSPEND unless the above is the case. > > > > > SMART_SUSPEND seems to have two different meanings. (1) If the device > > > > > is already in runtime suspend when a system sleep starts, skip the > > > > > suspend_late and suspend_noirq callbacks. (2) Under certain (similar) > > > > > circumstances, skip the resume callbacks. The documentation only > > > > > mentions (1) but the code also handles (2). > > > > > > > > That's because (2) is the THAW case and I was distracted before I got > > > > to documenting it properly. Sorry. > > > > > > > > The problem is that if you leave the device in runtime suspend, calling > > > > ->freeze_late() or ->freeze_noirq() on it is not useful and if you have > > > > skipped those, running the corresponding "thaw" callbacks is not useful > > > > either (what would they do, specifically?). > > > > > > > > There is a whole problem of whether or not devices should be left in > > > > runtime suspend during hibernation and I have not had a chance to get > > > > to the bottom of that yet. > > > > > > Not only that. The distinction between SMART_SUSPEND and > > > direct_complete is rather subtle, and it doesn't seem to be carefully > > > explained anywhere. In fact, I'm not sure I understand it myself. :-) > > > For example, the direct_complete mechanism is very careful about not > > > leaving a device in runtime suspend if a descendant (or other dependent > > > device) will remain active. Does SMART_SUSPEND behave similarly? If > > > so, it isn't documented. > > > > The difference is that SMART_SUSPEND allows the ->suspend callback to > > be invoked which may decide to resume the device (or reconfigure it > > for system wakeup if that doesn't require resuming it). IOW, this > > means "I can cope with a runtime-suspended device going forward". > > [But if the device is still runtime-suspended during ->suspend_late(), > > its configuration is regarded as "final".] > > > > In turn, direct_complete means something like "if this device is > > runtime-suspended, leave it as is and don't touch it during the whole > > suspend-resume cycle". > > Right; let's spell this out in the documentation too. > > > > > > At a couple of points in the code, THAW and RESTORE events are each > > > > > treatedly specially, with no explanation. > > > > > > > > Right, which is related to the kind of work in progress situation regarding > > > > the flags and hibernation mentioned above. Again, sorry about that. > > > > > > I haven't thought about those issues as much as you have. Still, it > > > seems obvious that the FREEZE/THAW phases should be very happy to leave > > > devices in runtime suspend throughout (without even worrying about > > > wakeup settings), and the RESTORE phase should always bring everything > > > back out of runtime suspend. > > > > These were exactly my original thoughts, but then when I started to > > consider possible interactions the restore kernel (which also carries > > out the "freeze" transition before jumping into the image kernel), it > > became less clear. > > > > The concerns is basically whether or not attempting to power on > > devices that are already powered on can always be guaranteed to work. > > This doesn't affect THAW, because during THAW the driver knows what > state the device is in. It only affects RESTORE. But during RESTORE > the driver really doesn't know anything about the device state, even > with the current code. The restore kernel doesn't even know whether > the boot kernel put the device through a FREEZE transition, because > it's possible that the driver was in a module that hadn't been loaded > yet when the resume-from-hibernation started. > > Thus, drivers face this problem already. I don't think we need to > worry about it. OK > > > What to do during the POWEROFF phase isn't so clear, because it depends > > > on how the platform handles the poweroff transition. > > > > POWEROFF is exactly analogous to SUSPEND AFAICS. > > The difference is that on many platforms (such as desktop PCs) the > POWEROFF callbacks don't have to power-down the device, because the > firmware will power _everything_ off (except the devices needed for > wakeup, of course). But on other platforms this isn't true, so on them > POWEROFF does need to behave like SUSPEND. And there are platforms where the firmware turns off everything (except for wakeup devices) at the end of system-wide suspend too. There really isn't that much of a difference in general. > > > Okay, let's start with direct_complete. The direct_complete mechanism > > > is applied to the SUSPEND and RESUME phases under the following > > > conditions: > > > > > > DPM_FLAG_NEVER_SKIP (or better, DPM_FLAG_NO_DIRECT_COMPLETE) > > > is clear; [Incidentally, since a driver can set this flag > > > whenever its ->prepare routine returns 0, why do we need > > > DPM_FLAG_SMART_PREPARE?] > > > > Because the former allows the driver to avoid providing a ->prepare > > callback always returning 1. > > Do you mean NEVER_SKIP allows the driver to avoid providing a ->prepare > callback which always returns _0_? If that's not what you meant, I > don't understand. Yes, I thought 0 and wrote 1, sorry. > > > Either the device has no system-PM callbacks at all or else the > > > ->prepare callback returns a positive value; > > > > Why so? > > Isn't that exactly what __device_prepare() does? After your latest > patch, we have: > > dev->power.direct_complete = state.event == PM_EVENT_SUSPEND && > (ret > 0 || dev->power.no_pm_callbacks) && > !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP); > > which is exactly what I said, isn't it? I misread what you wrote, so agreed. > > > All of the device's descendants and dependents also want to use > > > direct_complete; > > > > Yes. > > > > > Neither the device nor any of its descendants/dependents is > > > enabled for wakeup; > > > > Yes. > > > > > The device is runtime suspended just before the ->suspend > > > callback would normally be issued. > > > > Yes. > > > > > When the mechanism applies, none of the suspend or resume callbacks (in > > > any of their normal, _early, _late, or _noirq variants) are issued, > > > only ->complete. Consequently the device remains in runtime suspend > > > throughout the system sleep. > > > > > > Currently, direct_complete is never applied during any of the system > > > hibernation phases (FREEZE, THAW, POWEROFF, RESTORE). This may change > > > in the future. > > > > > > Is this description correct and complete? > > > > It is mostly. :-) > > I forgot to mention that if power.syscore is set then none of these > mechanisms apply because none of the callbacks are issued. Does > anything else need to be changed? No, I don't think so.