Hi Luca, On Wed, May 10, 2023 at 3:07 AM luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote: > > > I was thinking more about this and was doing some more digging into > > this. I was also wrong about min{}. Giving it some more thought, I > > think (U/Umax) is indeed the only equation we need and it will take > > care of caping the reclaiming at Umax. > > Uhm... I am not sure about this: for 1 single task on 1 CPU, yes, > u/Umax does the right thing. But when there are multiple tasks on the > CPU, I think it can cause issues (because every task ends up trying to > execute for Umax). > Ah ok, I understand now. I was going with the single CPU equation of U/Umax where this would not have been an issue as U(running_bw) takes care of the adjustment and does not allow a single task to use whole of Umax. I read the SMP GRUB paper you shared and see that it uses u_i/Umax and that might cause issues as you mentioned. > The "max{}" comes from the original multi-processor GRUB algorithm: > https://arxiv.org/pdf/1512.01984.pdf (see Equation (13) - in that > equation, the part we call u_extra is included in Uinact[p]) > Thanks for sharing. I read the paper and got an overall idea an background of existing implementation. > the "1 - u_inact - u_extra" part is needed to make sure that the > real-time guarantees are not broken by the reclaiming mechanism... But > it can end up with a task trying to consume too much time on a single > CPU, hence the "u/Umax" term in the "max{}" is needed to make sure that > a task will not consume more than Umax of a CPU. > > Now, if we have one single task on a CPU u/Umax will always be larger > than the other term... But when we have multiple tasks the other term > is needed too. > Understood, thanks for explaining. > (BTW, when considering multiple tasks on multiple CPUs, another > potential problem is given by u_extra... Now that I remember all the > details, u_extra is not "Umax - this_bw" - this is true when we consider > only one CPU, but is is "Umax - sum(u_i)/m" (where "sum(u_i)" is the > sum of the bandwidths of all the SCHED_DEADLINE tasks in the root > domain, and "m" is the number of CPUs in the root domain)... So, the > reclaimable CPU time is distributed uniformly on all the CPUs and this > could create some issues. But let's see what happens after the div64 > fix and the SCHED_FLAG_RECLAIM fix) > This makes sense. This also means that we wouldn't be able to replace "Uextra + Uinact" with "Umax - running_bw" and I was seeing problems with SMP testing. So I shall revert to "Uextra + Uinact" in v2. And I think the potential issue with Uextra would be avoided by the check for Uextra + Uinact > Umax to make sure that we don't reclaim more than Umax for a single cpu. I have tested the patch with SMP using the stressor mentioned in the commit message and running cyclicdeadline in parallel. The results are similar to upstream and GRUB able to reclaim upto Umax now. I shall send the v2 soon after a bit more testing.. Thanks a lot for all the valuable inputs and detailed explanation :-) Thanks, Vineeth