Hi Luca, On Tue, May 16, 2023 at 12:19 PM luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote: > > > I was thinking it should probably > > be okay for tasks to reclaim differently based on what free bw is > > left on the cpu it is running. For eg: if cpu 1 has two tasks of bw > > .3 each, each task can reclaim "(.95 - .6) / 2" and another cpu with > > only one task(.3 bandwidth) reclaims (.95 - .3). So both cpus > > utilization is .95 and tasks reclaim what is available on the cpu. > > I suspect (but I am not sure) this only works if tasks do not migrate. > >From what I am seeing, if the reserved bandwidth of all tasks on a cpu is less than Umax, then this works. Even with migration, if the task lands on another cpu where the new running_bw < Umax, then it runs and reclaims the free bandwidth. But this breaks if running_bw > Umax and it can happen if total_bw is within limits, but a cpu is overloaded. For eg: four tasks with reservation (7, 10) on a three cpu system. Here two cpus will have running_bw = .7 but third cpu will be 1.4 even though total_bw = 2.80 which is less than the limit of 2.85. > > > With "1 - Uinact", where Uinact accounts for a portion of global free > > bandwidth, tasks reclaim proportionately to the global free bandwidth > > and this causes tasks with lesser bandwidth to reclaim lesser when > > compared to higher bandwidth tasks even if they don't share the cpu. > > This is what I was seeing in practice. > > Just to be sure: is this with the "original" Uextra setting, or with > your new "Uextra = Umax - this_bw" setting? > (I am not sure, but I suspect that "1 - Uinact - Uextra" with your new > definition of Uextra should work well...) > I am seeing this with original Uextra setting where the global bandwidth is accounted. With "Uextra = Umax - this_bw", reclaiming seems to be correct and I think it is because it considers local bandwidth only. > > With dq = -(max{u_i, (Umax - Uinact - Uextra)} / Umax) * dt (1) > > TID[636]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.08 > > TID[635]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.07 > > TID[637]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 95.06 > > > > With dq = -(max{u_i, (1 - Uinact - Uextra)} / Umax) * dt (2) > > TID[601]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65 > > TID[600]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65 > > TID[602]: RECLAIM=1, (r=3ms, d=100ms, p=100ms), Util: 35.65 > > Maybe I am missing something and I am misunderstanding the situation, > but my impression was that this is the effect of setting > Umax - \Sum(u_i / #cpus in the root domain) > I was hoping that with your new Umax setting this problem could be > fixed... I am going to double-check my reasoning. > Even with the Umax_reclaim changes, equation (1) is the one which reclaims upto 95% when number of tasks is less than the number of cpus. With more tasks than cpus, eq (1) still reclaims more than eq (2) and cpu utilization caps at 95%. I also need to dig more to understand the reason behind this. Thanks for looking into this, I will also study more on this and keep you posted.. Thanks, Vineeth