Re: Just fix the bug - Re: [PATCH 1/8] PM: Opportunistic suspend support.

Neil Brown <neilb@xxxxxxx> · Fri, 28 May 2010 16:06:01 +1000

On Thu, 27 May 2010 21:57:02 -0700
Arve Hjønnevåg <arve@xxxxxxxxxxx> wrote:

> 2010/5/27 Neil Brown <neilb@xxxxxxx>:
> > On Thu, 27 May 2010 02:07:21 -0700
> > Arve Hjønnevåg <arve@xxxxxxxxxxx> wrote:
> >
> >> >
> >> >> of it into pm QoS stuff and if one day someone solves the rogue app
> >> >> problem, we can migrate over.
> >> >
> >> > If it's so important for Android and no one else, Android can carry it
> >> > out of tree.
> >> >
> >>
> >> This is not only important for Android. If you use suspend on a
> >> current Linux system you run the risk of loosing wakeup events. If you
> >> have wakeup events that you cannot afford to lose your only option is
> >> to never suspend. On some hardware (e.g. x86) the cost of not
> >> suspending is always huge, on other hardware (many ARM SOCs) the cost
> >> is only huge if your apps behave poorly.
> >>
> >
> > So here is my suggestion.
> > Rather than trying to push a feature that is clearly meeting lots of
> > resistance, the Android devs should state the problem as a bug that needs
> > fixing.  As you have.
> > Upstream is a lot more receptive of fixing bugs than adding features.
> >
> > In this case the bug is that you cannot suspend without the risk of losing
> > wakeup events.  This is a real bug that for your use case is a serious
> > bug.  I've toyed with several ways of fixing this but the one that seems most
> > promising is to note that in the kernel the suspend process is two-stage with
> > a 'prepare' followed by a 'suspend'. Userspace cannot make that distinction
> > and so ends up with a race.
> >
> > Maybe userspace should be able to say "prepare to suspend" with the meaning
> > that after a successful return, any event which would cause a wakeup sets a
> > flag so that the final suspend returns immediately (without actually going to
> > the lower power state).
> >
> > Then your opportunistic suspend could be entirely in userspace where you
> > wouldn't have to fight with the kernel crowd :-)
> > The suspend-daemon would:
> >  Wait for all user-space suspend blocks to be dropped.
> >  Tell the kernel to "prepare to suspend".
> >  Tell all userspace programs which have registered for the message that they
> >    should prepare to suspend.  They have the opportunity at this point to
> >    take out a new suspend block if they notice an event that has
> >    just arrived
> >  Wait for all those programs to acknowledge
> >  If there are no new suspend blocks, tell the kernel to suspend
> >  else tell the kernel to abort the suspend.
> >
> > This (I think) allows race-free opportunistic suspend in user-space where
> > you can do all the accounting you need.
> >
> 
> Perhaps, but it forces all user space programs that get events from
> the kernel to also receive messages from the suspend-daemon, check for
> other events again, then respond to the suspend-daemon. The current
> suspend blocker interface is easier to use.

Maybe so.  There are quite possibly better ways to fix the bug.  There are a
variety of different tradeoffs possible and I suspect we could have fun
arguing about those.  My main point though is that if we focus on a bug that
needs to be fixed, we should be able to keep the conversation more focussed.
Currently it seems to be branching all over the place which doesn't seem very
helpful.

>                                              I don't see how your
> suggestion avoids races for events that pass trough several kernel
> layers though. If a wakeup event happened before the "prepare to
> suspend" call but has not yet been passed to user-space, the
> user-space program that needs this event will not know that it needs
> to block suspend when it gets the prepare-to-suspend message.

Each layer must understand that the event is a wake-up event, and the
"prepare to suspend call" to each layer should drive all pending events
through to the next layer.  If the layers (drivers) are prepared in the
"right" order, this will force the event all the way to user-space.  If they
are called in exactly the "Wrong" order, this will require multiple
prepare/suspend/instant-resume cycles to get the event through, but I suspect
that this would very rarely result in truly pathological behaviour.

I'm guessing your current code (sorry, I haven't looked at all) already tracks
events up through multiple layers so they can interact with a pending
opporunistic-suspend request.  I suspect that aspect of the code isn't
particularly controversial.
Keep that, and use it precisely to implement a race-free "suspend" request
from user-space.

> 
> > I don't fully understand your requirements for accounting of devices drivers
> > rejecting or blocking a suspend, so I cannot say precisely how that would fit
> > in.  Maybe you just need to know - whenever the 'suspend request' completes -
> > what the wakeup events were.  It shouldn't be too hard to export that to
> > user-space via sysfs.
> >
> > I won't propose an exact enhancement to the user-space interface for
> > requesting a suspend, but I suspect it should expose each of
> >  suspend_prepare
> >  suspend_devices_and_enter
> >  suspend_finish
> > (or close analogues there-of) to user-space.  It is tempting to map those to
> > "open-for-write", "write", "close", but I'm not sure that suspend_prepare
> > would be appropriate if the app was about to write "disk" - it is a pity that
> > both suspend and hibernate use the same sysfs file.
> >
> > So just fix the bug, and everyone will be happy :-)
> >
> 
> I already have, but everyone do not appear to be happy.
> 

I don't think you have.  You have proposed a significant new feature: a
suspend-as-soon-as-you-can request which *user*space*can*block*.  People
don't like that because it seems like a poor second cousin to something that
would be really useful (user-space setting more general latency
requirements).

I am suggesting that you stick with the feature we have, which is that
user-space can request a suspend and the kernel/hardware can cause a
subsequent (possibly immediate) resume in response to a wake-up event.
Argue that the current definition of "wake-up event" is too weak and does not
allow the feature to be used safely.  Present an implementation (I suspect you
have most of it already) where a wake-up event is tracked all the way from
the hardware to user-space and is still a 'wake-up event' until userspace
actually consumes it.

Then user-space simply has to:
  poll for event to be ready
  request suspend-block in user-space
  consume event
  handle event
  release suspend-block
  loop

No change to API.  No new concepts.  Simply a bug and a direct fix.

You probably want the "suspend" request to block until there are no pending
events (and then immediately fail if there were) so that there is no risk of
the suspend-daemon spinning asking of a suspend which appears to resume
immediately (until some other process sees an event and blocks suspend.

NeilBrown
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm