On 25/02/25 00:02, Qais Yousef wrote: > On 02/24/25 10:27, Juri Lelli wrote: > > > > Okay I see. The issue though is that for a DL system with power management > > > features on that warrant to wake up a sugov thread to update the frequency is > > > sort of half broken by design. I don't see the benefit over using RT in this > > > case. But I appreciate I could be misguided. So take it easy on me if it is > > > obviously wrong understanding :) I know in Android usage of DL has been > > > difficult, but many systems ship with slow switch hardware. > > > > > > How does DL handle the long softirqs from block and network layers by the way? > > > This has been in a practice a problem for RT tasks so they should be to DL. > > > sugov done in stopper should be handled similarly IMHO. I *think* it would be > > > simpler to masquerade sugov thread as irq pressure. > > > > Kind of a trick question :), as DL doesn't handle this kind of > > :-) > > > load/pressure explicitly. It is essentially agnostic about it. From a > > system design point of view though, I would say that one should take > > that into account and maybe convert sensible kthreads to DL, so that the > > overall bandwidth can be explicitly evaluated. If one doesn't do that > > probably a less sound approach is to treat anything not explicitly > > scheduled by DL, but still required from a system perspective, as > > overload and be more conservative when assigning bandwidth to DL tasks > > (i.e. reduce the maximum amount of available bandwidth, so that the > > system doesn't get saturated). > > Maybe I didn't understand your initial answer properly. But what I got is that > we set as DL to do what you just suggested of converting it kthread to DL to > take its bandwidth into account. But we have been lying about bandwidth so far > and it was ignored? (I saw early bailouts of SCHED_FLAG_SUGOV was set in > bandwidth related operations) Ignored as to have something 'that works'. :) But, it's definitely far from being good. > > > You can use the rate_limit_us as a potential guide for how much bandwidth sugov > > > needs if moving it to another class really doesn't make sense instead? > > > > Or maybe try to estimate/measure how much utilization sugov threads are > > effectively using while running some kind of workload of interest and > > use that as an indication for DL runtime/period. > > I don't want to side track this thread. So maybe I should start a new thread to > discuss this. You might have seen my other series on consolidating cpufreq > updates. I'm not sure sugov can have a predictable period. Maybe runtime, but > it could run repeatedly, or it could be quite for a long time. Doesn't need to have a predictable period. Sporadic (activations are not periodic) tasks work well with DEADLINE if one is able to come up with a sensible bandwidth allocation for them. So for sugov (and other kthreads) the system designer should be thinking about the amount of CPU to give to each kthread (runtime/period) and the granularity of such allocation (period). > TBH I always though we use DL because it was the highest sched_class that is > not a stopper. > > Anyway. Happy to take this discussion into another thread if this is better. > I didn't mean to distract from debugging the reported issue. No worries! But, a separate thread might help to get more eyes on this, I agree. Best, Juri