In a multi-processor system, bandwidth usage is divided equally to all cpus. This causes issues with reclaiming free bandwidth on a cpu. "Uextra" is same on all cpus in a root domain and running_bw would be different based on the reserved bandwidth of tasks running on the cpu. This causes disproportionate reclaiming - task with lesser bandwidth reclaims less even if its the only task running on that cpu. Following is a small test with three tasks with reservations (8,10) (1,10) and (1, 100). These three tasks run on different cpus. But since the reclamation logic calculates available bandwidth as a factor of globally available bandwidth, tasks with lesser bandwidth reclaims only little compared to higher bandwidth even if cpu has free and available bandwidth to be reclaimed. TID[730]: RECLAIM=1, (r=8ms, d=10ms, p=10ms), Util: 95.05 TID[731]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 31.34 TID[732]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 3.16 Fix: use the available bandwidth on each cpu to calculate reclaimable bandwidth. Admission control takes care of total bandwidth and hence using the available bandwidth on a specific cpu would not break the deadline guarentees. With this fix, the above test behaves as follows: TID[586]: RECLAIM=1, (r=1ms, d=100ms, p=100ms), Util: 95.24 TID[585]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 95.01 TID[584]: RECLAIM=1, (r=8ms, d=10ms, p=10ms), Util: 95.01 Signed-off-by: Vineeth Pillai (Google) <vineeth@xxxxxxxxxxxxxxx> --- kernel/sched/deadline.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 91451c1c7e52..85902c4c484b 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1272,7 +1272,7 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se) * Umax: Max usable bandwidth for DL. Currently * = sched_rt_runtime_us / sched_rt_period_us * Uextra: Extra bandwidth not reserved: - * = Umax - \Sum(u_i / #cpus in the root domain) + * = Umax - this_bw * u_i: Bandwidth of an admitted dl task in the * root domain. * @@ -1286,22 +1286,14 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se) */ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *dl_se) { - u64 u_act; - u64 u_inact = rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ - /* - * Instead of computing max{u, (rq->dl.max_bw - u_inact - u_extra)}, - * we compare u_inact + rq->dl.extra_bw with - * rq->dl.max_bw - u, because u_inact + rq->dl.extra_bw can be larger - * than rq->dl.max_bw (so, rq->dl.max_bw - u_inact - rq->dl.extra_bw - * would be negative leading to wrong results) + * max{u, Umax - Uinact - Uextra} + * = max{u, max_bw - (this_bw - running_bw) + (this_bw - running_bw)} + * = max{u, running_bw} = running_bw + * So dq = -(max{u, Umax - Uinact - Uextra} / Umax) dt + * = -(running_bw / max_bw) dt */ - if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw) - u_act = dl_se->dl_bw; - else - u_act = rq->dl.max_bw - u_inact - rq->dl.extra_bw; - - return div64_u64(delta * u_act, rq->dl.max_bw); + return div64_u64(delta * rq->dl.running_bw, rq->dl.max_bw); } /* -- 2.40.1