Hi Vineeth, Nice work. ;-) Few quick comments below: On Wed, May 10, 2023 at 6:46 PM Vineeth Pillai <vineeth@xxxxxxxxxxxxxxx> wrote: > > Current reclaim calculation for GRUB is a bit inaccurate and the > inaccuracy gets larger as the bandwidth of tasks becomes smaller. > I have a test program to show the issue - it runs one or more > deadline threads and observes the utilization. Following tests > are run on an isolated cpu(isolcpus=3) in a 4 cpu system and the > results as shown below: > > RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% > TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33 > TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35 > TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35 > TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.29 > > RUN 2: runtime=2ms, deadline=period=10ms, RT capacity = 95% > TID[704]: RECLAIM=1, (r=2ms, d=10ms, p=10ms), Util: 79.96 > TID[704]: RECLAIM=1, (r=2ms, d=10ms, p=10ms), Util: 80.06 > TID[704]: RECLAIM=1, (r=2ms, d=10ms, p=10ms), Util: 80.00 > > RUN 3: runtime=1ms, deadline=period=100ms, RT capacity = 95% > TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 > TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 > TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.70 > > When running multiple tasks, the reclaimed bandwidth is divided > proportionately, but is not reclaimed to the max allowable limit: > > RUN 4: 2 SCHED_FLAG_RECLAIM tasks, 1 normal task > Task 1: runtime=1ms, deadline=period=10ms > Task 2: runtime=1ms, deadline=period=10ms > Task 3: runtime=5ms, deadline=period=20ms(normal) > TID[624]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.10 > TID[625]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.10 > TID[626]: RECLAIM=0, (r=5ms, d=20ms, p=20ms), Util: 25.07 > TID[624]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.06 > TID[625]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 20.13 > TID[626]: RECLAIM=0, (r=5ms, d=20ms, p=20ms), Util: 25.12 > TID[624]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 19.95 > TID[625]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 19.93 > TID[626]: RECLAIM=0, (r=5ms, d=20ms, p=20ms), Util: 25.04 > > I have also tested multiple tasks on all cpus allowing for tasks to > migrate and see the same issue there as well. Running 10 tasks on 3 > cpus with 6 SCHED_FLAG_RECLAIM and 4 normal tasks, top shows: > %Cpu0 : 70.1 us, 0.3 sy, 0.0 ni, 29.3 id, 0.0 wa > %Cpu1 : 69.1 us, 0.3 sy, 0.0 ni, 30.3 id, 0.3 wa > %Cpu2 : 70.5 us, 0.3 sy, 0.0 ni, 29.2 id, 0.0 wa > > The max{} logic in the existing implementation seems to not fully > capture the GRUB algorithm. > > This patch fixes the issue by appropriatley caping the max allowed > utilization and also slightly adjusting GRUB algorithm to account > for a mix of normal deadline and SCHED_FLAG_RECLAIM tasks. Looked at the patch quickly as I am due for bedtime ;-). I feel like this does more than should be done in one patch. So you should consider splitting it IMHO. > According to the GRUB rule, the runtime is depreciated as a factor > of active bandwidth of the runqueue: "dq = -dt", where U is the Saying "dq = -dt" where U does not make sense because U is not in the equation ;-). Suggest rephrase. > active bandwidth. Also, we do not allocate the full bandwidth of a > cpu to deadline task, but only a portion(Umax) to it, so as to avoid > deadline tasks starving lower class tasks. The equation could be > re-written as "dq = -(U / Umax) * dt" Isn't the equation in the code right now as: dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt ? That's what the kernel docs say [1]. So what do you mean by "could be re-written" ? [1] https://docs.kernel.org/scheduler/sched-deadline.html > > Since both normal deadline and SCHED_FLAG_RECLAIM tasks can share > cpu, we need to consider bandwidth of only SCHED_FLAG_RECLAIM tasks > in the equation: > "dq = -(Ureclaim / Umax_reclaim) * dt" This makes sense, we deplete only as a percent of what's reclaimable and not the whole damn thing. You did test a mix of reclaim and non-reclaim tasks so that's good, also since Luca approves that's great. ;-) > > Following are the results with this patch: > > RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% > TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 94.98 > TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.04 > TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.01 All these look 100% correct to me. Do these tasks start at the same time or are they shifted in their respective activations? Just wanted to be sure it behaves the same either way... [...] > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 3e8df6d31c1e..13d85af0f42b 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -257,6 +257,11 @@ static inline bool dl_entity_is_special(const struct sched_dl_entity *dl_se) > #endif > } > > +static inline bool dl_entity_is_reclaim(const struct sched_dl_entity *dl_se) > +{ > + return dl_se->flags & SCHED_FLAG_RECLAIM; > +} > + Can this helper addition be split out to a different patch? > /* > * Tells if entity @a should preempt entity @b. > */ > @@ -754,10 +759,20 @@ struct dl_rq { > u64 extra_bw; > > /* > - * Inverse of the fraction of CPU utilization that can be reclaimed > - * by the GRUB algorithm. > + * Maximum available bandwidth for this runqueue. This is used to > + * calculate reclaimable bandwidth for SCHED_FLAG_RECLAIM tasks. > + * By restricting maximum usable bandwidth, we aim to give other > + * tasks on lower classes a chance to run, when competing with > + * SCHED_FLAG_RECLAIM tasks. > */ > - u64 bw_ratio; > + u64 max_bw; > + > + /* > + * Active bandwidth of SCHED_FLAG_RECLAIM tasks on this rq. > + * This will be a subset of running_bw. > + */ > + u64 reclaim_bw; > + And perhaps addition and use of these new fields if it makes sense. I will take a closer look at your patches later or after v2.. - Joel > }; > > #ifdef CONFIG_FAIR_GROUP_SCHED > -- > 2.40.1 >