After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"), tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN are always woken up on the thaw path. Prior to that commit, tasks could ask freezer to consider them "frozen enough" via freezer_do_not_count(). The commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which allows freezer to immediately mark the task as TASK_FROZEN without waking up the task. This is efficient for the suspend path, but on the thaw path, the task is always woken up even if the task didn't need to wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although these tasks are capable of handling of the wakeup, we can observe a power/perf impact from the extra wakeup. We observed on Android many tasks wait in the TASK_FREEZABLE state (particularly due to many of them being binder clients). We observed nearly 4x the number of tasks and a corresponding linear increase in latency and power consumption when thawing the system. The latency increased from ~15ms to ~50ms. Avoid the spurious wakeups by saving the state of TASK_FREEZABLE tasks. If the task was running before entering TASK_FROZEN state (__refrigerator()) or if the task received a wake up for the saved state, then the task is woken on thaw. saved_state from PREEMPT_RT locks can be re-used because freezer would not stomp on the rtlock wait flow: TASK_RTLOCK_WAIT isn't considered freezable. For testing purposes, I use these commands can help see how many tasks were woken during thawing: 1. Setup: mkdir /sys/kernel/tracing/instances/freezer cd /sys/kernel/tracing/instances/freezer echo 0 > tracing_on ; echo > trace echo power:suspend_resume > set_event echo 'enable_event:sched:sched_wakeup if action == "thaw_processes" && start == 1' > events/power/suspend_resume/trigger echo 'traceoff if action == "thaw_processes" && start == 0' > events/power/suspend_resume/trigger echo 1 > tracing_on 2. Let kernel go to suspend 3. After kernel's back up: cat /sys/kernel/tracing/instances/freezer/trace | grep sched_wakeup | grep -o "pid=[0-9]*" | sort -u | wc -l Signed-off-by: Elliot Berman <quic_eberman@xxxxxxxxxxx> --- Changes in v4: - Split into 2 commits - Link to v3: https://lore.kernel.org/r/20230908-avoid-spurious-freezer-wakeups-v3-1-d49821fda04d@xxxxxxxxxxx Changes in v3: - Remove #ifdeferry for saved_state in kernel/sched/core.c - Link to v2: https://lore.kernel.org/r/20230830-avoid-spurious-freezer-wakeups-v2-1-8877245cdbdc@xxxxxxxxxxx Changes in v2: - Rebase to v6.5 which includes commit 1c06918788e8 ("sched: Consider task_struct::saved_state in wait_task_inactive()") This allows us to remove the special jobctl handling on thaw. - Link to v1: https://lore.kernel.org/r/20230828-avoid-spurious-freezer-wakeups-v1-1-8be8cf761472@xxxxxxxxxxx --- Elliot Berman (2): sched/core: Remove ifdeffery for saved_state freezer,sched: Use saved_state to reduce some spurious wakeups include/linux/sched.h | 2 -- kernel/freezer.c | 41 +++++++++++++++++++---------------------- kernel/sched/core.c | 40 +++++++++++++++++----------------------- 3 files changed, 36 insertions(+), 47 deletions(-) --- base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c change-id: 20230817-avoid-spurious-freezer-wakeups-9f8619680b3a Best regards, -- Elliot Berman <quic_eberman@xxxxxxxxxxx>