Re: [PATCH v2 3/2] sched/deadline: Check bandwidth overflow earlier for hotplug

Qais Yousef <qyousef@xxxxxxxxxxx> · Tue, 25 Feb 2025 00:02:37 +0000

On 02/24/25 10:27, Juri Lelli wrote:

> > Okay I see. The issue though is that for a DL system with power management
> > features on that warrant to wake up a sugov thread to update the frequency is
> > sort of half broken by design. I don't see the benefit over using RT in this
> > case. But I appreciate I could be misguided. So take it easy on me if it is
> > obviously wrong understanding :) I know in Android usage of DL has been
> > difficult, but many systems ship with slow switch hardware.
> > 
> > How does DL handle the long softirqs from block and network layers by the way?
> > This has been in a practice a problem for RT tasks so they should be to DL.
> > sugov done in stopper should be handled similarly IMHO. I *think* it would be
> > simpler to masquerade sugov thread as irq pressure.
> 
> Kind of a trick question :), as DL doesn't handle this kind of

:-)

> load/pressure explicitly. It is essentially agnostic about it. From a
> system design point of view though, I would say that one should take
> that into account and maybe convert sensible kthreads to DL, so that the
> overall bandwidth can be explicitly evaluated. If one doesn't do that
> probably a less sound approach is to treat anything not explicitly
> scheduled by DL, but still required from a system perspective, as
> overload and be more conservative when assigning bandwidth to DL tasks
> (i.e. reduce the maximum amount of available bandwidth, so that the
> system doesn't get saturated).

Maybe I didn't understand your initial answer properly. But what I got is that
we set as DL to do what you just suggested of converting it kthread to DL to
take its bandwidth into account. But we have been lying about bandwidth so far
and it was ignored? (I saw early bailouts of SCHED_FLAG_SUGOV was set in
bandwidth related operations)

> 
> > You can use the rate_limit_us as a potential guide for how much bandwidth sugov
> > needs if moving it to another class really doesn't make sense instead?
> 
> Or maybe try to estimate/measure how much utilization sugov threads are
> effectively using while running some kind of workload of interest and
> use that as an indication for DL runtime/period.

I don't want to side track this thread. So maybe I should start a new thread to
discuss this. You might have seen my other series on consolidating cpufreq
updates. I'm not sure sugov can have a predictable period. Maybe runtime, but
it could run repeatedly, or it could be quite for a long time.

TBH I always though we use DL because it was the highest sched_class that is
not a stopper.

Anyway. Happy to take this discussion into another thread if this is better.
I didn't mean to distract from debugging the reported issue.

Thanks!

--
Qais Yousef