On 23/05/2023 11:23, Vincent Guittot wrote: > On Thu, 18 May 2023 at 14:42, Hongyan Xia <hongyan.xia2@xxxxxxx> wrote: >> >> Hi Qais, >> >> On 2023-05-18 12:30, Qais Yousef wrote: >>> Please CC sched maintainers (Ingo + Peter) next time as they should pick this >>> up ultimately and they won't see it from the list only. >> >> Will do. I was using the get_maintainers script and I thought that gave >> me all the CCs. >> >>> On 05/05/23 16:24, Hongyan Xia wrote: [...] >>>> diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst >>>> index 74d5b7c6431d..524df07bceba 100644 >>>> --- a/Documentation/scheduler/sched-util-clamp.rst >>>> +++ b/Documentation/scheduler/sched-util-clamp.rst >>>> @@ -669,6 +669,19 @@ but not proportional to Fmax/Fmin. >>>> >>>> p0->util_avg = 300 + small_error >>>> >>>> +The reason why util_avg is around 300 even though it runs for 900 at Fmin is: > > What does it mean running for 900 at Fmin ? util_avg is a ratio in the > range [0:1024] without time unit > >>>> +Although running at Fmin reduces the rate of rq_clock_pelt() to 1/3 thus >>>> +accumulates util_sum at 1/3 of the rate at Fmax, the clock period >>>> +(rq_clock_pelt() now minus previous rq_clock_pelt()) in: >>>> + >>>> +:: >>>> + >>>> + util_sum / clock period = util_avg > > I don't get the meaning of the formula above ? There is no "clock > period" (although I'm not sure what it means here) involved when > computing util_avg I also didn't get this one. IMHO. the relation between util_avg and util_sum is `divider = LOAD_AVG_MAX - 1024 + avg->period_contrib`. But I can't see how this matters here. The crucial point here is IMHO as long we have idle time (p->util_avg < CPU (current) capacity) the util_avg will not raise to 1024 since at wakeup util_avg will be only decayed (since the task was sleeping, i.e. !!se->on_rq = 0). And we are scale invariant thanks to the functionality in update_rq_clock_pelt() (which is executed when p is running). The pelt clock update at this moment (wakeup) is setting clock_pelt to clock_task since rq->curr is the idle task but IMHO that is not the reason why p->util_avg behaves like this. The moment `p->util_avg >= CPU (current) capacity` there is no idle time left, i.e. no such `only decay` updates happens for p anymore (only `accrue/decay` updates in tick) and the result is that p->util_avg goes 1024. > Also, there is no linear relation between util_avg and Fmin/Fmax > ratio. Fmin/Fmax ratio is meaningful in regards to the ratio between > running time and period time of a periodic task. I understand the > reference of pelt in this document as a quite simplified description > of PELT so I'm not sure that adding a partial explanation will help. > It will probably cause more confusion to people. The only thing that > is sure, is that PELT expects some idle time to stay fully invariant > for periodic task +1 ... we have to be able to understand the code. BTW, schedutil.rst has also paragraphs about PELT and `Frequency / CPU Invariance` and also refers to kernel/sched/pelt.h:update_rq_clock_pelt() for details. [...]