On Fri, May 26, 2023 at 10:55 AM Vineeth Pillai <vineeth@xxxxxxxxxxxxxxx> wrote: > > According to the GRUB[1] rule, the runtime is depreciated as: > "dq = -max{u, (1 - Uinact - Uextra)} dt" (1) > > To guarantee that deadline tasks doesn't starve lower class tasks, > we do not allocate the full bandwidth of the cpu to deadline tasks. > Maximum bandwidth usable by deadline tasks is denoted by "Umax". > Considering Umax, equation (1) becomes: > "dq = -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2) Makes sense, your patch fixes the issue where we need the depreciation ratio to factor in the throttling limits as well. The code looks sane to me too. Acked-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx> thanks, - Joel > > Current implementation has a minor bug in equation (2), which this > patch fixes. > > The reclamation logic is verified by a sample program which creates > multiple deadline threads and observing their utilization. The tests > were run on an isolated cpu(isolcpus=3) on a 4 cpu system. > > Tests on 6.3.0 > ============== > > RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% > TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.33 > TID[693]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 93.35 > > RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95% > TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 > TID[708]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 16.69 > > RUN 3: 2 tasks > Task 1: runtime=1ms, deadline=period=10ms > Task 2: runtime=1ms, deadline=period=100ms > TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.67 > TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.37 > TID[631]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 62.38 > TID[632]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 6.23 > > As seen above, the reclamation doesn't reclaim the maximum allowed > bandwidth and as the bandwidth of tasks gets smaller, the reclaimed > bandwidth also comes down. > > Tests with this patch applied > ============================= > > RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% > TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.19 > TID[608]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.16 > > RUN 2: runtime=1ms, deadline=period=100ms, RT capacity = 95% > TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.27 > TID[616]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.21 > > RUN 3: 2 tasks > Task 1: runtime=1ms, deadline=period=10ms > Task 2: runtime=1ms, deadline=period=100ms > TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.64 > TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.66 > TID[620]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 86.45 > TID[621]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 8.73 > > Running tasks on all cpus allowing for migration also showed that > the utilization is reclaimed to the maximum. Running 10 tasks on > 3 cpus SCHED_FLAG_RECLAIM - top shows: > %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa > %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa > %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa > > [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng. > (2015). Parallel and sequential reclaiming in multicore > real-time global scheduling. > > Signed-off-by: Vineeth Pillai (Google) <vineeth@xxxxxxxxxxxxxxx> > --- > kernel/sched/deadline.c | 56 +++++++++++++++++++---------------------- > kernel/sched/sched.h | 6 +++++ > 2 files changed, 32 insertions(+), 30 deletions(-) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index 71b24371a6f7..ea5db080f008 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -1260,43 +1260,39 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se) > } > > /* > - * This function implements the GRUB accounting rule: > - * according to the GRUB reclaiming algorithm, the runtime is > - * not decreased as "dq = -dt", but as > - * "dq = -max{u / Umax, (1 - Uinact - Uextra)} dt", > + * This function implements the GRUB accounting rule. According to the > + * GRUB reclaiming algorithm, the runtime is not decreased as "dq = -dt", > + * but as "dq = -(max{u, (1 - Uinact - Uextra)} / Umax) dt", > * where u is the utilization of the task, Umax is the maximum reclaimable > - * utilization, Uinact is the (per-runqueue) inactive utilization, computed > - * as the difference between the "total runqueue utilization" and the > - * runqueue active utilization, and Uextra is the (per runqueue) extra > - * reclaimable utilization. > - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations > - * multiplied by 2^BW_SHIFT, the result has to be shifted right by > - * BW_SHIFT. > - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, > - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT. > - * Since delta is a 64 bit variable, to have an overflow its value > - * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds. > - * So, overflow is not an issue here. > + * bandwidth, Uinact is the (per-runqueue) inactive utilization, computed as > + * the difference between the "total runqueue utilization" and the runqueue > + * active utilization, and Uextra is the (per runqueue) extra reclaimable > + * utilization. > + * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations multiplied > + * by 2^BW_SHIFT, the result has to be shifted right by BW_SHIFT. > + * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, dl_bw > + * is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT.i > + * Since delta is a 64 bit variable, to have an overflow its value should be > + * larger than 2^(64 - 20 - 8), which is more than 64 seconds. So, overflow is > + * not an issue here. > */ > static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se) > { > - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ > u64 u_act; > - u64 u_act_min = (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT; > + u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ > > /* > - * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)}, > - * we compare u_inact + rq->dl.extra_bw with > - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because > - * u_inact + rq->dl.extra_bw can be larger than > - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative > - * leading to wrong results) > + * Instead of computing max{u, (u_max - u_inact - u_extra)}, we > + * compare u_inact + u_extra with u_max - u, because u_inact + u_extra > + * can be larger than u_max. So, u_max - u_inact - u_extra would be > + * negative leading to wrong results. > */ > - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min) > - u_act = u_act_min; > + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw) > + u_act = dl_se->dl_bw; > else > - u_act = BW_UNIT - u_inact - rq->dl.extra_bw; > + u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw; > > + u_act = (u_act * rq->dl.bw_ratio) >> RATIO_SHIFT; > return (delta * u_act) >> BW_SHIFT; > } > > @@ -2784,12 +2780,12 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq) > { > if (global_rt_runtime() == RUNTIME_INF) { > dl_rq->bw_ratio = 1 << RATIO_SHIFT; > - dl_rq->extra_bw = 1 << BW_SHIFT; > + dl_rq->max_bw = dl_rq->extra_bw = 1 << BW_SHIFT; > } else { > dl_rq->bw_ratio = to_ratio(global_rt_runtime(), > global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT); > - dl_rq->extra_bw = to_ratio(global_rt_period(), > - global_rt_runtime()); > + dl_rq->max_bw = dl_rq->extra_bw = > + to_ratio(global_rt_period(), global_rt_runtime()); > } > } > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 3e8df6d31c1e..73027c2806dc 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -753,6 +753,12 @@ struct dl_rq { > u64 this_bw; > u64 extra_bw; > > + /* > + * Maximum available bandwidth for reclaiming by SCHED_FLAG_RECLAIM > + * tasks of this rq. Used in calculation of reclaimable bandwidth(GRUB). > + */ > + u64 max_bw; > + > /* > * Inverse of the fraction of CPU utilization that can be reclaimed > * by the GRUB algorithm. > -- > 2.40.1 >