Re: Just fix the bug - Re: [PATCH 1/8] PM: Opportunistic suspend support.

Arve Hjønnevåg <arve@xxxxxxxxxxx> · Thu, 27 May 2010 23:37:00 -0700

2010/5/27 Neil Brown <neilb@xxxxxxx>:
> On Thu, 27 May 2010 21:57:02 -0700
> Arve Hjønnevåg <arve@xxxxxxxxxxx> wrote:
>
>> 2010/5/27 Neil Brown <neilb@xxxxxxx>:
>> > On Thu, 27 May 2010 02:07:21 -0700
>> > Arve Hjønnevåg <arve@xxxxxxxxxxx> wrote:
>> >
>> >> >
>> >> >> of it into pm QoS stuff and if one day someone solves the rogue app
>> >> >> problem, we can migrate over.
>> >> >
>> >> > If it's so important for Android and no one else, Android can carry it
>> >> > out of tree.
>> >> >
>> >>
>> >> This is not only important for Android. If you use suspend on a
>> >> current Linux system you run the risk of loosing wakeup events. If you
>> >> have wakeup events that you cannot afford to lose your only option is
>> >> to never suspend. On some hardware (e.g. x86) the cost of not
>> >> suspending is always huge, on other hardware (many ARM SOCs) the cost
>> >> is only huge if your apps behave poorly.
>> >>
>> >
>> > So here is my suggestion.
>> > Rather than trying to push a feature that is clearly meeting lots of
>> > resistance, the Android devs should state the problem as a bug that needs
>> > fixing.  As you have.
>> > Upstream is a lot more receptive of fixing bugs than adding features.
>> >
>> > In this case the bug is that you cannot suspend without the risk of losing
>> > wakeup events.  This is a real bug that for your use case is a serious
>> > bug.  I've toyed with several ways of fixing this but the one that seems most
>> > promising is to note that in the kernel the suspend process is two-stage with
>> > a 'prepare' followed by a 'suspend'. Userspace cannot make that distinction
>> > and so ends up with a race.
>> >
>> > Maybe userspace should be able to say "prepare to suspend" with the meaning
>> > that after a successful return, any event which would cause a wakeup sets a
>> > flag so that the final suspend returns immediately (without actually going to
>> > the lower power state).
>> >
>> > Then your opportunistic suspend could be entirely in userspace where you
>> > wouldn't have to fight with the kernel crowd :-)
>> > The suspend-daemon would:
>> >  Wait for all user-space suspend blocks to be dropped.
>> >  Tell the kernel to "prepare to suspend".
>> >  Tell all userspace programs which have registered for the message that they
>> >    should prepare to suspend.  They have the opportunity at this point to
>> >    take out a new suspend block if they notice an event that has
>> >    just arrived
>> >  Wait for all those programs to acknowledge
>> >  If there are no new suspend blocks, tell the kernel to suspend
>> >  else tell the kernel to abort the suspend.
>> >
>> > This (I think) allows race-free opportunistic suspend in user-space where
>> > you can do all the accounting you need.
>> >
>>
>> Perhaps, but it forces all user space programs that get events from
>> the kernel to also receive messages from the suspend-daemon, check for
>> other events again, then respond to the suspend-daemon. The current
>> suspend blocker interface is easier to use.
>
> Maybe so.  There are quite possibly better ways to fix the bug.  There are a
> variety of different tradeoffs possible and I suspect we could have fun
> arguing about those.  My main point though is that if we focus on a bug that
> needs to be fixed, we should be able to keep the conversation more focussed.
> Currently it seems to be branching all over the place which doesn't seem very
> helpful.
>
>>                                              I don't see how your
>> suggestion avoids races for events that pass trough several kernel
>> layers though. If a wakeup event happened before the "prepare to
>> suspend" call but has not yet been passed to user-space, the
>> user-space program that needs this event will not know that it needs
>> to block suspend when it gets the prepare-to-suspend message.
>
> Each layer must understand that the event is a wake-up event, and the
> "prepare to suspend call" to each layer should drive all pending events
> through to the next layer.  If the layers (drivers) are prepared in the
> "right" order, this will force the event all the way to user-space.  If they
> are called in exactly the "Wrong" order, this will require multiple
> prepare/suspend/instant-resume cycles to get the event through, but I suspect
> that this would very rarely result in truly pathological behaviour.
>
> I'm guessing your current code (sorry, I haven't looked at all) already tracks
> events up through multiple layers so they can interact with a pending
> opporunistic-suspend request.  I suspect that aspect of the code isn't
> particularly controversial.
> Keep that, and use it precisely to implement a race-free "suspend" request
> from user-space.
>
>>
>> > I don't fully understand your requirements for accounting of devices drivers
>> > rejecting or blocking a suspend, so I cannot say precisely how that would fit
>> > in.  Maybe you just need to know - whenever the 'suspend request' completes -
>> > what the wakeup events were.  It shouldn't be too hard to export that to
>> > user-space via sysfs.
>> >
>> > I won't propose an exact enhancement to the user-space interface for
>> > requesting a suspend, but I suspect it should expose each of
>> >  suspend_prepare
>> >  suspend_devices_and_enter
>> >  suspend_finish
>> > (or close analogues there-of) to user-space.  It is tempting to map those to
>> > "open-for-write", "write", "close", but I'm not sure that suspend_prepare
>> > would be appropriate if the app was about to write "disk" - it is a pity that
>> > both suspend and hibernate use the same sysfs file.
>> >
>> > So just fix the bug, and everyone will be happy :-)
>> >
>>
>> I already have, but everyone do not appear to be happy.
>>
>
> I don't think you have.  You have proposed a significant new feature: a
> suspend-as-soon-as-you-can request which *user*space*can*block*.  People
> don't like that because it seems like a poor second cousin to something that
> would be really useful (user-space setting more general latency
> requirements).
>

The ability to block the suspend request is necessary to avoid races
with wakeup events.

> I am suggesting that you stick with the feature we have, which is that
> user-space can request a suspend and the kernel/hardware can cause a
> subsequent (possibly immediate) resume in response to a wake-up event.

Since not all wakeup events propagate to user-space it is very useful
to allow the kernel to reenter suspend when there are no active
suspend blockers remaining.

> Argue that the current definition of "wake-up event" is too weak and does not
> allow the feature to be used safely.  Present an implementation (I suspect you
> have most of it already) where a wake-up event is tracked all the way from
> the hardware to user-space and is still a 'wake-up event' until userspace
> actually consumes it.
>
> Then user-space simply has to:
>  poll for event to be ready
>  request suspend-block in user-space
>  consume event
>  handle event
>  release suspend-block
>  loop

This you copy this from the example in the documentation file added by
this patch? This is exactly what we do for input events, but for this
to work the kernel also has to block suspend while an event is ready
but not consumed.

>
> No change to API.  No new concepts.  Simply a bug and a direct fix.
>
I don't know how you got to that conclusion.

> You probably want the "suspend" request to block until there are no pending
> events (and then immediately fail if there were) so that there is no risk of
> the suspend-daemon spinning asking of a suspend which appears to resume
> immediately (until some other process sees an event and blocks suspend.
>
> NeilBrown
>

-- 
Arve Hjønnevåg
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm