On Mon, Oct 16, 2017 at 03:12:35AM +0200, Rafael J. Wysocki wrote: > Hi All, > > Well, this took more time than expected, as I tried to cover everything I had > in mind regarding PM flags for drivers. > > This work was triggered by attempts to fix and optimize PM in the > i2c-designware-platdev driver that ended up with adding a couple of > flags to the driver's internal data structures for the tracking of > device state (https://marc.info/?l=linux-acpi&m=150629646805636&w=2). > That approach is sort of suboptimal, though, because other drivers will > probably want to do similar things and if all of them need to use internal > flags for that, quite a bit of code duplication may ensue at least. > > That can be avoided in a couple of ways and one of them is to provide a means > for drivers to tell the core what to do and to make the core take care of it > if told to do so. Hence, the idea to use driver flags for system-wide PM > that was briefly discussed during the LPC in LA last month. > > One of the flags considered at that time was to possibly cause the core > to reuse the runtime PM callback path of a device for system suspend/resume. > Admittedly, that idea didn't look too bad to me until I had started to try to > implement it and I got to the PCI bus type's hibernation callbacks. Then, I > moved the patch I was working on to /dev/null right away. I mean it. > > No, this is not going to happen. No way. > > Moreover, that experience made me realize that the whole *idea* of using the > runtime PM callback path for system-wide PM was actually totally bogus (sorry > Ulf). > > The whole point of having different callbacks pointers for different types of > device transitions is because it may be necessary to do different things in > those callbacks in general. Now, if you consider runtime PM and system > suspend/resume *only* and from a driver perspective, then yes, in some cases > the same pair of callback routines may be used for all suspend-like and > resume-like transitions of the device, but if you add hibernation to the mix, > then it is not so clear any more unless the callbacks don't actually do any > power management at all, but simply quiesce the device's activity and then > activate it again. Namely, changing power states of devices during the > hibernation's "freeze" and "thaw" transitions rarely makes sense at all and > the "restore" transition needs to be able to cope with uninitialized devices > (in fact, it should be prepared to cope with devices in *any* state), so > runtime PM is hardly suitable for them. Still, if a *driver* choses to not > do any real PM in its PM callbacks and leaves that to a middle layer (quite > a few drivers do that), then it possibly can use one pair of callbacks in all > cases and be happy, but middle layers pretty much have to use different > callback routines for different transitions. > > If you are a middle layer, your role is basically to do PM for a certain > group of devices. Thus you cannot really do the same in ->suspend or > ->suspend_early and in ->runtime_suspend (because the former generally need to > take device_may_wakeup() into account and the latter doesn't) and you shouldn't > really do the same in ->suspend and ->freeze (becuase the latter shouldn't > change the device's power state) and so on. To put it bluntly, trying > to use the ->runtime_suspend callback of a middle layer for anything other > than runtime suspend is complete and utter nonsense. At the same time, the > ->runtime_resume callback of a middle layer may be reused to some extent, > but even that doesn't cover the "thaw" transitions during hibernation. > > What can work (and this is the only strategy that can work AFAICS) is to > point different callback pointers *in* *a* *driver* to the same routine > if the driver wants to reuse that code. That actually will work for PCI > and USB drivers today, at least most of the time, but unfortunately there > are problems with it for, say, platform devices. > > The first problem is the requirement to track the status of the device > (suspended vs not suspended) in the callbacks, because the system-wide PM > code in the PM core doesn't do that. The runtime PM framework does it, so > this means adding some extra code which isn't necessary for runtime PM to > the callback routines and that is not particularly nice. > > The second problem is that, if the driver wants to do anything in its > ->suspend callback, it generally has to prevent runtime suspend of the > device from taking place in parallel with that, which is quite cumbersome. > Usually, that is taken care of by resuming the device from runtime suspend > upfront, but generally doing that is wasteful (there may be no real need to > resume the device except for the fact that the code is designed this way). > > On top of the above, there are optimizations to be made, like leaving certain > devices in suspend after system resume to avoid wasting time on waiting for > them to resume before user space can run again and similar. > > This patch series focuses on addressing those problems so as to make it > easier to reuse callback routines by pointing different callback pointers > to them in device drivers. The flags introduced here are to instruct the > PM core and middle layers (whatever they are) on how the driver wants the > device to be handled and then the driver has to provide callbacks to match > these instructions and the rest should be taken care of by the code above it. > > The flags are introduced one by one to avoid making too many changes in > one go and to allow things to be explained better (hopefully). They mostly > are mutually independent with some clearly documented exceptions. > > The first three patches in the series are about an issue with the > direct-complete optimization introduced some time ago in which some middle > layers decide on whether or not to do the optimization without asking the > drivers. And, as it turns out, in some cases the drivers actually know > better, so the new flags introduced by these patches are here for these > drivers (and the DPM_FLAG_NEVER_SKIP one is really to avoid having to define > ->prepare callbacks always returning zero). > > The really interesting things start to happen in patches [4-9/12] which make it > possible to avoid resuming devices from runtime suspend upfront during system > suspend at least in some cases (and when direct-complete is not applied to the > devices in question), but please refer to the changelogs for details. > > The i2d-designware-platdev driver is used as the primary example in the series > and the patches modifying it are based on some previous changes currently in > linux-next AFAICS (the same applies to the intel-lpss driver), but these > patches can wait until everything is properly merged. They are included here > mostly as illustration. > > Overall, the series is based on the linux-next branch of the linux-pm.git tree > with some extra patches on top of it and all of the names of new entities > introduced in it are negotiable. Thanks for the great explaination, I was wondering how your proposal discussed at Plumbers was going to work out in the end :) The patch series looks good to me (minor questions already sent on the patches), but what does this mean for drivers? Do they now have to do a lot of work to take advantage of this, like you did for the i2d-designware-platdev driver? Or will things continue to work as-is and it's only an opt-in type thing where the bus/driver wants to take advantage of it? thanks, greg k-h