Re: [PATCH 1/2] sched/deadline: accurate reclaim bandwidth for GRUB

Vineeth Remanan Pillai <vineeth@xxxxxxxxxxxxxxx> · Tue, 9 May 2023 15:29:21 -0400

Hi Luka,

Thanks for reviewing the changes.

On Tue, May 9, 2023 at 7:25 AM luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> if I understand well, your patch addresses 2 separate issues:
> 1) The current reclaiming code uses an approximation to avoid using
>    div64_u64(), which might introduce too much overhead (at least, as
>    far as I remember :). Your patch changes it to use the exact,
>    non-approximated, equation
> 2) Currently, the reclaimable CPU time is divided as if all the
>    SCHED_DEADLINE tasks (and not only the SCHED_FLAG_RECLAIM tasks)
>    could use it; your patch changes the code to distribute the
>    reclaimable CPU time only to tasks having the SCHED_FLAG_RECLAIM
>    flag set
>
> Is this understanding correct?
Yes, the above two details are correct. In addition to that, I think
the existing equation had a small bug:
GRUB paper says, running time is depreciated as
   "dq = -U dt" where U is running_bw.
This is assuming that the whole cpu bandwidth could be reclaimed. But
in our case, we cap at Umax. So the equation should be
   "dq = -(U/Umax) dt"

And then we have an upper limit of (1 - Uextra - Uinact). I feel we
should be taking the minimum of these values to make sure that we
don't cross the upper bound. I think the equation should be:
   "dq = -min{U/Umax, (1 - Uextra - Uinact)} dt"

But the current implementation is
   "dq = -max{u/Umax, (1 - Uextra - Uinact)} dt"
   Where u = dl_se->dl_bw.

After fixing the above equation, reclaim logic works well but when only
SCHED_FLAG_RECLAIM tasks are running. When we have a mix of both
normal deadline and SCHED_FLAG_RECLAIM, it breaks the reclaim logic.
As you pointed out, the second part of the fix is for that.

> If using div64_u64() does not introduce too much overhead, then I agree
> with the first change.
In my testing, I did not see a change in the performance of the
grub_reclaim function. Both old and new implementations take 10 to
20 nanoseconds on average. But my observation might not be accurate.

With this change, it is difficult to avoid division as the denominator
is a variable and we would not be able to pre-calculate an inverse. We
could probably calculate inverse during {__add/__sub}_running_bw so as
to reduce the frequency of div64_u64. I shall try this for v2.

> The second change also looks good to me.
>
> I have no comments on the code, but there is one thing in the comments
> that looks misleading to me (or I am misunderstanding the code or the
> comments):
>

> > + * We can calculate Umax_reclaim as:
> > + *   Umax_reclaim:   this_bw + Uinact + Ureclaim
>
> I think this looks like a typo (summing this_bw to Uinact
> looks wrong). Should "this_bw" be Uextra?
>
Thanks a lot for pointing it out. Yes you are right, I messed up in the
comments. It should be Uextra and I shall fix it in v2.

> > + *   dq = -(Ureclaim / Umax_reclaim) * dt
> > + *      = -(Ureclaim / (Ureclaim + Uextra + Uinact)) * dt
>
> I think this should be the correct equation. BTW, since you are summing
> Uextra and Uinact, mabe you could just use "Umax - running_bw"?
>
Makes sense, it will avoid an extra variable Uinact. I shall modify this
in v2.

Thanks,
Vineeth