On 08/11/2021 10:20, Zhaoyang Huang wrote: > On Mon, Nov 8, 2021 at 4:49 PM Xuewen Yan <xuewen.yan94@xxxxxxxxx> wrote: >> >> Hi Dietmar >> >> On Sat, Nov 6, 2021 at 1:20 AM Dietmar Eggemann >> <dietmar.eggemann@xxxxxxx> wrote: >>> >>> On 05/11/2021 06:58, Zhaoyang Huang wrote: [...] >>>>> This will let the idle task (swapper) pass. Is this indented? Or do you >>>>> want to only let CFS tasks (including SCHED_IDLE) pass? >>>> idle tasks will NOT call psi_memstall_xxx. We just want CFS tasks to >>>> scale the STALL time. >>> >>> Not sure I get this. >>> >>> __schedule() -> psi_sched_switch() -> psi_task_change() -> >>> psi_group_change() -> record_times() -> psi_memtime_fixup() >>> >>> is something else than calling psi_memstall_enter() or _leave()? >>> >>> IMHO, at least record_times() can be called with current equal >>> swapper/X. Or is it that PSI_MEM_SOME is never set for the idle task in >>> this callstack? I don't know the PSI internals. > According to my understanding, PSI_MEM_SOME represents the CORE's > state within which there is at least one task trapped in memstall > path(only counted in by calling PSI_MEMSTALL_ENTER). record_times is > responsible for collecting the delta time of the CORE since it start. > What we are doing is to make the delta time more precise. So idle task > is irrelevant for these. Coming back to the original snippet of the patch. static unsigned long psi_memtime_fixup(u32 growth) { if (!(current->policy == SCHED_NORMAL || current->policy == SCHED_BATCH)) return growth_fixed; With this condition: (1) you're not bailing when current is the idle task. It has policy equal 0 (SCHED_NORMAL) (2) But you're bailing for a SCHED_IDLE (CFS) task. I'm not sure that this is indented here? Since you want to do the scaling later based on whats left for CFS tasks from the CPU capacity my hunch is that you want to rather do: if (current->sched_class != &fair_sched_class) return growth_fixed; What's the possible sched classes of current in psi_memtime_fixup?