Re: [PATCH 1/8] PM: Add suspend block api.

Arve Hjønnevåg <arve@xxxxxxxxxxx> · Wed, 19 May 2010 17:35:42 -0700

On Wed, May 19, 2010 at 8:55 AM, Paul Walmsley <paul@xxxxxxxxx> wrote:
> Hi Arve,
>
> On Mon, 17 May 2010, Arve Hjønnevåg wrote:
>
>> On Mon, May 17, 2010 at 8:34 PM, Paul Walmsley <paul@xxxxxxxxx> wrote:
>> > On Mon, 17 May 2010, Arve Hjønnevåg wrote:
>> >> On Mon, May 17, 2010 at 7:17 PM, Paul Walmsley <paul@xxxxxxxxx> wrote:
>> >> > On Fri, 14 May 2010, Arve Hjønnevåg wrote:
>> >> >> On Thu, May 13, 2010 at 11:27 PM, Paul Walmsley <paul@xxxxxxxxx> wrote:
>> >> >> > On Thu, 13 May 2010, Arve Hjønnevåg wrote:
>> >> >> >
>> >> >> >> Adds /sys/power/policy that selects the behaviour of /sys/power/state.
>> >> >> >> After setting the policy to opportunistic, writes to /sys/power/state
>> >> >> >> become non-blocking requests that specify which suspend state to enter
>> >> >> >> when no suspend blockers are active. A special state, "on", stops the
>> >> >> >> process by activating the "main" suspend blocker.
>> >> >> >>
>> >> >> >> Signed-off-by: Arve Hjønnevåg <arve@xxxxxxxxxxx>
>> >> >> >
>> >> >> > Looking at the way that suspend-blocks are used in the current Android
>> >> >> > 'msm' kernel tree[1], they seem likely to cause layering violations,
>> >> >> > fragile kernel code with userspace dependencies, and code that consumes
>> >> >> > power unnecessarily.
>> >> >> >
>> >> >> > For example, in drivers/mmc/core/core.c:721, in mmc_rescan() [2], we find
>> >> >> > the following code:
>> >> >> >
>> >> >> >        /* give userspace some time to react */
>> >> >> >        wake_lock_timeout(&mmc_delayed_work_wake_lock, HZ / 2);
>> >> >> >
>> >> >> > This is a layering violation.  The MMC subsystem should have nothing to do
>> >> >> > with "giving userspace time to react."  That is the responsibility of the
>> >> >> > Linux scheduler.
>> >> >> >
>> >> >> > This code is also intrinsically fragile and use-case dependent.  What if
>> >> >> > userspace occasionally needs more than (HZ / 2) to react?  If the
>> >> >> > distributor is lucky enough to catch this before the product ships, then
>> >> >> > the distributor can patch in a new magic number.  But if the device makes
>> >> >> > it to the consumer, the result is an unstable device that unpredictably
>> >> >> > suspends.
>> >> >> >
>> >> >> > The above code will also waste power.  Even if userspace takes less than
>> >> >> > (HZ / 2) to react, the system will still be prevented from entering a
>> >> >> > system-wide low power state for the duration of the remaining time.  This
>> >> >> > is in contrast to an approach that uses the idle loop to enter a
>> >> >> > system-wide low power state.  The moment that the system has no further
>> >> >> > work to do, it can start saving power.
>> >> >> >
>> >> >>
>> >> >> Yes, preventing suspend for 1/2 second wastes some power, but are you
>> >> >> seriously suggesting that it wastes more power then never suspending?
>> >> >> Opportunitic suspend does not prevent entering low power modes from
>> >> >> idle.
>> >> >
>> >> > Sorry, my E-mail was unclear.  Here is my intended point:
>> >> >
>> >> > The current Android opportunistic suspend governor does not enter full
>> >> > system suspend until all suspend-blocks are released.  If the governor
>> >> > were modified to respect the scheduler, then it could enter suspend the
>> >> > moment that the system had no further work to do.
>> >> >
>> >>
>> >> On the hardware that shipped we enter the same power state from idle
>> >> and suspend, so the only power savings we get from suspend that we
>> >> don't get in idle is from not respecting the scheduler and timers.
>> >
>> > Other people
>> > say that their hardware XXX.  (This is probably likely for anything that
>> > supports ACPI.)
>
> Heh, looks like this part of my E-mail escaped a little too early.
>
>> >> > So during part of the (HZ / 2) time interval in the above code, the system
>> >> > is running some kernel or userspace code.  But after that work is done, an
>> >> > idle-loop-based opportunistic suspend governor could enter idle during
>> >> > the remaining portion of that (HZ / 2) time interval, while the current
>> >> > governor must keep the system out of suspend.
>> >> >
>> >> Triggering a full suspend (that disregards normal timers) from idle
>> >> does not work, since the system may temporarily enter idle while
>> >> processing the event.
>> >
>> > Could you provide an example?
>>
>> The code takes a pagefault. The system is idle while waiting for the
>> dma to finish.
>
> OK.
>
>> > In such a case, how can the kernel guarantee that the system will process
>> > the event within (HZ / 2) seconds?
>>
>> It can't, but in my opinion if the event did not start processing in
>> 1/2 second it was probably not time critical.
>
> Thanks for the correction regarding the unit of the time interval.
>
> To use the pagefault example, wouldn't that be partially dependent on
> factors outside the control of the task that handles the input event?
> (e.g., the time needed to page back in, or other code running on the
> system.)
>
Yes (we don't need any timeouts to handle input events though).

>> >> > Of course, to do this, a different method for freezing processes would be
>> >> > needed; one possible approach is described here[1]; others have proposed
>> >> > similar approaches.
>> >>
>> >> If you freeze processes you need something like a suspend blocker to
>> >> prevent processes from being frozen.
>> >
>> > (It occurs to me that "frozen" is the wrong terminology to use.
>> >  Perhaps "paused" would be a better term, along the lines of [1].)
>> >
>> > The full set of processes does not need to be paused.  For example, if
>> > "untrusted" processes are run in a cgroup, and only tasks in that cgroup
>> > are put into TASK_INTERRUPTIBLE, then "trusted" processes will not be
>> > paused.
>>
>> We want to freeze trusted processes.
>
> By "trusted," I mean processes that are run with a suspend block, as in
> your example to Tony[1].  Isn't it so that Android has processes that
> either run entirely under suspend-blocks, or which take suspend-blocks at
> some point in time during their execution?  Wouldn't those processes need
> to be prevented from a force-pause or force-freeze?
>
We don't freeze processes while any suspend blockers are active.

>> >> By having a process that never gets frozen you could implement this
>> >> entirely in user space, but you would have to funnel all wakeup events
>> >> though this process.
>> >
>> > The proposal[2] does require a kernel change.  No wakeup funnel process
>> > would be needed.  The proposal is to put processes into
>> > TASK_INTERRUPTIBLE, not TASK_STOPPED (unlike most "freezers").
>> >
>>
>> Why is the method used for pausing the processes relevant? If the
>> process is paused it cannot process the wakeup event.
>
> Tasks in TASK_INTERRUPTIBLE are woken back up when an event arrives,

So you are not actually preventing them from. A task that wakes up 60
times a second will still drain the battery, unless you happened to
pause it while it was active.

> unlike tasks in TASK_STOPPED, which are stopped until a SIGCONT arrives.
>
>> The only way to get the wake event to the process is to funnel it though
>> a process that never is paused.
>
>
> - Paul
>
> 1. http://lkml.org/lkml/2010/5/7/420
>
>

-- 
Arve Hjønnevåg
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm