Hi Luca, Merging the last two mails in this reply :-) > So, we are wasting 181.3 - 95 = 86.3% of CPU time, which 590 cannot > reclaim (because it cannot execute simultaneously on 2 CPUs). > Correct. Thanks for explaining it in detail, I was tracing the scheduler and verified this pattern you explained. > Now that the problem is more clear to me, I am trying to understand a > possible solution (as you mention, moving some extra bandwidth from the > 590's CPU will fix this problem... But I am not sure if this dynamic > extra bandwidth migration is feasible in practice without introducing > too much overhead) > > I'll look better at your new proposal. > The idea that I mentioned tries to solve this problem in a best effort way: If global load is high, use the global "Uextra = rq->dl.extra_bw" and "Umax = rq->dl.this_bw + rq->dl.extra_bw". Otherwise use the local values "Umax= rq->dl.max_bw", "Uextra= rq->dl.max_bw - rq->dl.this_bw". This is still not perfect, but tries to reclaim very close to maximum allowed limit almost always. Please have a look when you get a chance :-). > > I just tried to repeat this test on a VM with 3 CPUs, and I can > reproduce the stall (100% of CPU time reclaimed by SCHED_DEADLINE > tasks, with no possibility for the other tasks to execute) when I use > dq = -(max{u_i / Umax, (Umax - Uinact - Uextra)}) * dt > > But when I use > dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt > everything works as expected, the 4 tasks reclaim 95% of the CPU > time and my shell is still active... > (so, I cannot reproduce the starvation issue with this equation) > Sorry about this confusion, yes you are right, there is no stall with this equation. The only issue is the lesser reclaim when the load is less and tasks have different bandwidth requirements. > So, I now think the second one is the correct equation to be used. > Thanks for confirming. I think it probably makes sense to get the fix for the equation to go in as a first step and then we can investigate more about the second issue (less reclaiming with less load and different bandwidth) and fix it separately. What do you think? I shall send the next iteration with the fix for the equation alone if its okay with you. Thanks, Vineeth