On 2022/10/28 23:58, Suren Baghdasaryan wrote: > On Thu, Oct 27, 2022 at 11:50 PM Chengming Zhou > <zhouchengming@xxxxxxxxxxxxx> wrote: >> >> Hello, >> >> Thanks for picking this up. There is a newer version which has been acked: >> https://lore.kernel.org/all/20221014110551.22695-1-zhouchengming@xxxxxxxxxxxxx/ > > Hmm. Indeed this seems to be an older version and not the one I asked > Peter to pick up in > https://lore.kernel.org/all/CAJuCfpHeJuZBbv-q+WXjgNHwt_caMomFPL3L9rxosXOrZz3fBw@xxxxxxxxxxxxxx/. > Not sure what went wrong. Peter, could you please replace this one > with https://lore.kernel.org/all/20221014110551.22695-1-zhouchengming@xxxxxxxxxxxxx/? Oh, I didn't notice that email. > > Chengming, please do not top-post next time. Would be better if you > posted your note under the "Link:" field in this email. Got it, I will do next time. Thanks! > Thanks! > >> >> As well another PSI patch that has been acked by Johannes: >> https://lore.kernel.org/all/20220926081931.45420-1-zhouchengming@xxxxxxxxxxxxx/ >> >> Thanks! >> >> >> On 2022/10/28 14:42, tip-bot2 for Chengming Zhou wrote: >>> The following commit has been merged into the sched/core branch of tip: >>> >>> Commit-ID: 7d89d7bb921c5ae5a428df282e64ee5692e26fe0 >>> Gitweb: https://git.kernel.org/tip/7d89d7bb921c5ae5a428df282e64ee5692e26fe0 >>> Author: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> >>> AuthorDate: Mon, 10 Oct 2022 18:42:06 +08:00 >>> Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >>> CommitterDate: Thu, 27 Oct 2022 11:01:23 +02:00 >>> >>> sched/psi: Fix avgs_work re-arm in psi_avgs_work() >>> >>> Pavan reported a problem that PSI avgs_work idle shutoff is not >>> working at all. Because PSI_NONIDLE condition would be observed in >>> psi_avgs_work()->collect_percpu_times()->get_recent_times() even if >>> only the kworker running avgs_work on the CPU. >>> >>> Although commit 1b69ac6b40eb ("psi: fix aggregation idle shut-off") >>> avoided the ping-pong wake problem when the worker sleep, psi_avgs_work() >>> still will always re-arm the avgs_work, so shutoff is not working. >>> >>> This patch changes to consider current CPU groupc as IDLE if the >>> kworker running avgs_work is the only task running and no IOWAIT >>> or MEMSTALL sleep tasks, in which case we will shut off the avgs_work >>> if other CPUs' groupc are also IDLE. >>> >>> One potential problem is that the brief period of non-idle time >>> incurred between the aggregation run and the kworker's dequeue will >>> be stranded in the per-cpu buckets until avgs_work run next time. >>> The buckets can hold 4s worth of time, and future activity will wake >>> the avgs_work with a 2s delay, giving us 2s worth of data we can leave >>> behind when shut off the avgs_work. If the kworker run other works after >>> avgs_work shut off and doesn't have any scheduler activities for 2s, >>> this maybe a problem. >>> >>> Reported-by: Pavan Kondeti <quic_pkondeti@xxxxxxxxxxx> >>> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> >>> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> >>> Link: https://lore.kernel.org/r/20221010104206.12184-1-zhouchengming@xxxxxxxxxxxxx >>> --- >>> kernel/sched/psi.c | 15 +++++++++++++++ >>> 1 file changed, 15 insertions(+) >>> >>> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c >>> index ee2ecc0..f4cdf6f 100644 >>> --- a/kernel/sched/psi.c >>> +++ b/kernel/sched/psi.c >>> @@ -242,6 +242,8 @@ static void get_recent_times(struct psi_group *group, int cpu, >>> u32 *pchanged_states) >>> { >>> struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu); >>> + int current_cpu = raw_smp_processor_id(); >>> + bool only_avgs_work = false; >>> u64 now, state_start; >>> enum psi_states s; >>> unsigned int seq; >>> @@ -256,6 +258,15 @@ static void get_recent_times(struct psi_group *group, int cpu, >>> memcpy(times, groupc->times, sizeof(groupc->times)); >>> state_mask = groupc->state_mask; >>> state_start = groupc->state_start; >>> + /* >>> + * This CPU has only avgs_work kworker running, snapshot the >>> + * newest times then don't need to re-arm for this groupc. >>> + * Normally this kworker will sleep soon and won't wake >>> + * avgs_work back up in psi_group_change(). >>> + */ >>> + if (current_cpu == cpu && groupc->tasks[NR_RUNNING] == 1 && >>> + !groupc->tasks[NR_IOWAIT] && !groupc->tasks[NR_MEMSTALL]) >>> + only_avgs_work = true; >>> } while (read_seqcount_retry(&groupc->seq, seq)); >>> >>> /* Calculate state time deltas against the previous snapshot */ >>> @@ -280,6 +291,10 @@ static void get_recent_times(struct psi_group *group, int cpu, >>> if (delta) >>> *pchanged_states |= (1 << s); >>> } >>> + >>> + /* Clear PSI_NONIDLE so avgs_work won't be re-armed for this groupc */ >>> + if (only_avgs_work) >>> + *pchanged_states &= ~(1 << PSI_NONIDLE); >>> } >>> >>> static void calc_avgs(unsigned long avg[3], int missed_periods,