On 2/25/25 09:46, Juri Lelli wrote: > On 25/02/25 00:02, Qais Yousef wrote: >> On 02/24/25 10:27, Juri Lelli wrote: >> >>>> Okay I see. The issue though is that for a DL system with power management >>>> features on that warrant to wake up a sugov thread to update the frequency is >>>> sort of half broken by design. I don't see the benefit over using RT in this >>>> case. But I appreciate I could be misguided. So take it easy on me if it is >>>> obviously wrong understanding :) I know in Android usage of DL has been >>>> difficult, but many systems ship with slow switch hardware. >>>> >>>> How does DL handle the long softirqs from block and network layers by the way? >>>> This has been in a practice a problem for RT tasks so they should be to DL. >>>> sugov done in stopper should be handled similarly IMHO. I *think* it would be >>>> simpler to masquerade sugov thread as irq pressure. >>> >>> Kind of a trick question :), as DL doesn't handle this kind of >> >> :-) >> >>> load/pressure explicitly. It is essentially agnostic about it. From a >>> system design point of view though, I would say that one should take >>> that into account and maybe convert sensible kthreads to DL, so that the >>> overall bandwidth can be explicitly evaluated. If one doesn't do that >>> probably a less sound approach is to treat anything not explicitly >>> scheduled by DL, but still required from a system perspective, as >>> overload and be more conservative when assigning bandwidth to DL tasks >>> (i.e. reduce the maximum amount of available bandwidth, so that the >>> system doesn't get saturated). >> >> Maybe I didn't understand your initial answer properly. But what I got is that >> we set as DL to do what you just suggested of converting it kthread to DL to >> take its bandwidth into account. But we have been lying about bandwidth so far >> and it was ignored? (I saw early bailouts of SCHED_FLAG_SUGOV was set in >> bandwidth related operations) > > Ignored as to have something 'that works'. :) > > But, it's definitely far from being good. > >>>> You can use the rate_limit_us as a potential guide for how much bandwidth sugov >>>> needs if moving it to another class really doesn't make sense instead? >>> >>> Or maybe try to estimate/measure how much utilization sugov threads are >>> effectively using while running some kind of workload of interest and >>> use that as an indication for DL runtime/period. >> >> I don't want to side track this thread. So maybe I should start a new thread to >> discuss this. You might have seen my other series on consolidating cpufreq >> updates. I'm not sure sugov can have a predictable period. Maybe runtime, but >> it could run repeatedly, or it could be quite for a long time. > > Doesn't need to have a predictable period. Sporadic (activations are not > periodic) tasks work well with DEADLINE if one is able to come up with a > sensible bandwidth allocation for them. So for sugov (and other > kthreads) the system designer should be thinking about the amount of CPU > to give to each kthread (runtime/period) and the granularity of such > allocation (period). The only really sensible choice I see is rate_limit * some_constant_approximated_runtime and on many systems that may yield >100% of the capacity. Qais' proposed changes would even remove the theoretical rate_limit cap here. A lot of complexity for something that is essentially a non-issue in practice AFAICS...