Hi Luca, On Fri, May 19, 2023 at 6:18 AM luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote: > > On Fri, 19 May 2023 11:56:21 +0200 > luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote: > [...] > > OK, sorry again... I found my error immediately after sending the email. > Uextra is computed as "Umax - ...", not "1 - ...". > So, I now understand where the 35% comes from. > Thanks for debugging this, it makes sense now! > I now _suspect_ the correct equation should be > dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt > but I want to test it before wasting your time again; I'll write more > after performing some more tests. > I tried this equation and it fixes the above issue. But a little confused as to why we should not be limiting the second term to Umax? In my testing, I was seeing the issue solved with this equation as well: "dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt" With both these equations, it doesn't solve couple of other issues we had discussed before: - tasks with different bandwidth reclaims differently even when #tasks is less than #cpus. - cpu util may go to 100% when we have tasks with large bandwidth close to Umax As an eg. for issue 1, three tasks - (7,10) (3,10) and (1,10): TID[590]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.20 TID[591]: RECLAIM=1, (r=3ms, d=10ms, p=10ms), Util: 81.94 TID[592]: RECLAIM=1, (r=1ms, d=10ms, p=10ms), Util: 27.19 re. issue 2, four tasks with same reservation (7,10), tasks tries to reclaim leading to 100% cpu usage on all three cpus and leads to system hang. I was trying to understand the issue and it looks like static values of Uextra and Umax are causing inaccuracies. Uextra is calculated based on global bandwidth But based on the local state of the cpu, we could reclaim more or less than (u_inact + rq->dl.extra_bw). Similarly Umax is a local max for each cpu and we should not be reclaiming upto Umax unconditionally. If the global load is high, reclaiming upto Umax would cause problems as a migration can land in. I was trying an idea to dynamically decide Uextra and Umax based on global and local load. The crux of the idea is as below + if (rq->dl.running_bw > rq->dl.max_bw) + return delta; + + max_bw = rq->dl.this_bw + rq->dl.extra_bw; + extra_bw = rq->dl.extra_bw; + if (rq->dl.this_bw < rq->dl.extra_bw || max_bw > rq->dl.max_bw) { + extra_bw = rq->dl.max_bw - rq->dl.this_bw; + max_bw = rq->dl.max_bw; + } + And use this max_bw and extra_bw in the equation: "dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt" The reasoning for above changes are: - running_bw can be greater than max_bw in SMP and we should not be reclaiming if that's the case - Initially we assume max_bw and extra_bw, based on global load. If this_bw < extra_bw, it means that cpus are not heavily loaded and incoming migrations can be satisfied with local cpu's available bandwidth and we could reclaim upto rq->dl.max_bw and use extra_bw as "rq->dl.max_bw - rq->dl.this_bw". Also we should limit max_bw for any cpu to a maximum of rq->dl.max_cpu. This does not consider mix of normal and SCHED_FLAG_RECLAIM tasks and minor changes on top of this are need to consider that scenario. This seems to fix all the issues that I have encountered. The above fix doesn't work on equation: "dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt" cpu still spikes to 100% with 4 tasks of (7,10). I am not sure, but I guess it might be because we are not limiting the second term to Umax. Please let me know your thoughts on this. Thanks again, Vineeth