On Mon, Aug 28, 2023 at 10:33:04AM -0700, Elliot Berman wrote: > After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"), > tasks that are in TASK_FREEZABLE state and end up getting frozen are TASK_FREEZABLE state and what? Pls check once. > always woken up. Prior to that commit, tasks could ask freezer to > consider them "frozen enough" via freezer_do_not_conut(). As described > in Peter's commit, the reason for this change is to prevent these tasks > from being woken before SMP is back. The commit introduced a > TASK_FREEZABLE state which allows freezer to immediately mark the task > as TASK_FROZEN without waking up the task. On the thaw path, the task is > woken up even if the task didn't need to wake up and goes back to its > TASK_(UN)INTERRUPTIBLE state. Although these tasks are capable of > handling of the wakeup, we can observe a power/perf impact from the > extra wakeup. > > We observed on Android many tasks wait in the TASK_FREEZABLE state > (particularly due to many of them being binder clients). We observed > nearly 4x the number of tasks and a corresponding (almost) linear increase in > latency and power consumption when thawing the system. The latency > increased from ~15ms to ~50ms. > > Save the state of TASK_FREEZABLE tasks and restore it after thawing the > task without waking the task up. If the task received a wake up for the > saved_state before thawing, then the task is still woken upon thawing. > > Re-use saved_state from RT sleeping spinlocks because freezer doesn't > consider TASK_RTLOCK_WAIT freezable. > > Reported-by: Prakash Viswalingam <quic_prakashv@xxxxxxxxxxx> > Signed-off-by: Elliot Berman <quic_eberman@xxxxxxxxxxx> > --- > For testing purposes, I use these commands can help see how many tasks were > woken during thawing: > > 1. Setup: > mkdir /sys/kernel/tracing/instances/freezer > cd /sys/kernel/tracing/instances/freezer > echo 0 > tracing_on ; echo > trace > echo power:suspend_resume > set_event > echo 'enable_event:sched:sched_wakeup if action == \"thaw_processes\" && start == 1' > events/power/suspend_resume/trigger > echo 'traceoff if action == \"thaw_processes\" && start == 0' > events/power/suspend_resume/trigger > echo 1 > tracing_on > > 2. Let kernel go to suspend > > 3. After kernel's back up: > cat /sys/kernel/tracing/instances/freezer/trace | grep sched_wakeup | grep -o "pid=[0-9]*" | sort -u | wc -l > --- > include/linux/sched.h | 4 ++-- > kernel/freezer.c | 15 +++++++++++++-- > kernel/sched/core.c | 21 +++++++++++++-------- > 3 files changed, 28 insertions(+), 12 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index eed5d65b8d1f..e4ade5a18df2 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -746,8 +746,8 @@ struct task_struct { > #endif > unsigned int __state; > > -#ifdef CONFIG_PREEMPT_RT > - /* saved state for "spinlock sleepers" */ > +#if IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_FREEZER) > + /* saved state for "spinlock sleepers" and freezer */ > unsigned int saved_state; > #endif > > diff --git a/kernel/freezer.c b/kernel/freezer.c > index 4fad0e6fca64..6222cbfd97ab 100644 > --- a/kernel/freezer.c > +++ b/kernel/freezer.c > @@ -71,7 +71,11 @@ bool __refrigerator(bool check_kthr_stop) > for (;;) { > bool freeze; > > + raw_spin_lock_irq(¤t->pi_lock); > set_current_state(TASK_FROZEN); > + /* unstale saved_state so that __thaw_task() will wake us up */ > + current->saved_state = TASK_RUNNING; > + raw_spin_unlock_irq(¤t->pi_lock); > > spin_lock_irq(&freezer_lock); > freeze = freezing(current) && !(check_kthr_stop && kthread_should_stop()); > @@ -129,6 +133,7 @@ static int __set_task_frozen(struct task_struct *p, void *arg) > WARN_ON_ONCE(debug_locks && p->lockdep_depth); > #endif > > + p->saved_state = p->__state; > WRITE_ONCE(p->__state, TASK_FROZEN); > return TASK_FROZEN; > } > @@ -174,10 +179,16 @@ bool freeze_task(struct task_struct *p) > * state in p->jobctl. If either of them got a wakeup that was missed because > * TASK_FROZEN, then their canonical state reflects that and the below will > * refuse to restore the special state and instead issue the wakeup. > + * > + * Otherwise, restore the saved_state before the task entered freezer. For > + * typical tasks in the __refrigerator(), saved_state == 0 so nothing happens > + * here. For tasks which were TASK_NORMAL | TASK_FREEZABLE, their initial state > + * is returned unless they got an expected wakeup. Then they will be woken up as > + * TASK_FROZEN back in __thaw_task(). > */ Thanks for the detailed comment. The change looks good to me. > static int __set_task_special(struct task_struct *p, void *arg) > { > - unsigned int state = 0; > + unsigned int state = p->saved_state; > > if (p->jobctl & JOBCTL_TRACED) > state = TASK_TRACED; > @@ -188,7 +199,7 @@ static int __set_task_special(struct task_struct *p, void *arg) > if (state) > WRITE_ONCE(p->__state, state); > > - return state; > + return state & ~TASK_FROZEN; > } void __thaw_task(struct task_struct *p) { ... if (lock_task_sighand(p, &flags2)) { /* TASK_FROZEN -> TASK_{STOPPED,TRACED} */ bool ret = task_call_func(p, __set_task_special, NULL); unlock_task_sighand(p, &flags2); if (ret) goto unlock; } wake_up_state(p, TASK_FROZEN); unlock: spin_unlock_irqrestore(&freezer_lock, flags); } The comment there about task change needs update. I feel the "ret" should be renamed approriately to indicate whether wakeup is needed or not. Now that we have saved_state capturing the previous and any state change while task is frozen, can that be used and remove the job control and associated locking here? for ex: if saved_state is running, we need to wakeup otherwise, simply restore the __state from saved_state. > > void __thaw_task(struct task_struct *p) > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index a68d1276bab0..815d955764a5 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3992,13 +3992,17 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags) > * The caller holds p::pi_lock if p != current or has preemption > * disabled when p == current. > * > - * The rules of PREEMPT_RT saved_state: > + * The rules of saved_state: > * > * The related locking code always holds p::pi_lock when updating > * p::saved_state, which means the code is fully serialized in both cases. > * > - * The lock wait and lock wakeups happen via TASK_RTLOCK_WAIT. No other > - * bits set. This allows to distinguish all wakeup scenarios. > + * For PREEMPT_RT, the lock wait and lock wakeups happen via TASK_RTLOCK_WAIT. > + * No other bits set. This allows to distinguish all wakeup scenarios. > + * > + * For FREEZER, the wakeup happens via TASK_FROZEN. No other bits set. This > + * allows us to prevent early wakeup of tasks before they can be run on > + * asymmetric ISA architectures (eg ARMv9). > */ > static __always_inline > bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) > @@ -4013,13 +4017,14 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success) > return true; > } > > -#ifdef CONFIG_PREEMPT_RT > +#if IS_ENABLED(CONFIG_PREEMPT_RT) || IS_ENABLED(CONFIG_FREEZER) > /* > * Saved state preserves the task state across blocking on > - * an RT lock. If the state matches, set p::saved_state to > - * TASK_RUNNING, but do not wake the task because it waits > - * for a lock wakeup. Also indicate success because from > - * the regular waker's point of view this has succeeded. > + * an RT lock or TASK_FREEZABLE tasks. If the state matches, > + * set p::saved_state to TASK_RUNNING, but do not wake the task > + * because it waits for a lock wakeup or __thaw_task(). Also > + * indicate success because from the regular waker's point of > + * view this has succeeded. > * > * After acquiring the lock the task will restore p::__state > * from p::saved_state which ensures that the regular > > --- > base-commit: 6995e2de6891c724bfeb2db33d7b87775f913ad1 > change-id: 20230817-avoid-spurious-freezer-wakeups-9f8619680b3a Your patch seems based on v6.4. You might want to resend the patch on v6.5 to take the below commit into account. 1c06918788e8a ("sched: Consider task_struct::saved_state in wait_task_inactive()") Thanks, Pavan