Re: [PATCH 1/2] sched/deadline: accurate reclaim bandwidth for GRUB

luca abeni <luca.abeni@xxxxxxxxxxxxxxx> · Tue, 9 May 2023 22:48:29 +0200

Hi,

On Tue, 9 May 2023 15:29:21 -0400
Vineeth Remanan Pillai <vineeth@xxxxxxxxxxxxxxx> wrote:
[...]
> > Is this understanding correct?  
> Yes, the above two details are correct. In addition to that, I think
> the existing equation had a small bug:
> GRUB paper says, running time is depreciated as
>    "dq = -U dt" where U is running_bw.
> This is assuming that the whole cpu bandwidth could be reclaimed. But
> in our case, we cap at Umax. So the equation should be
>    "dq = -(U/Umax) dt"

Yes, this is the approximation I was mentioning... Instead of using a
division, I approximated it with a different equation using a sum.

> And then we have an upper limit of (1 - Uextra - Uinact). I feel we
> should be taking the minimum of these values to make sure that we
> don't cross the upper bound. I think the equation should be:
>    "dq = -min{U/Umax, (1 - Uextra - Uinact)} dt"
> 
> But the current implementation is
>    "dq = -max{u/Umax, (1 - Uextra - Uinact)} dt"
>    Where u = dl_se->dl_bw.

Well, here I think we should really use a "max{}", not a "min{}",
otherwise we risk to subtract an amount of time which is too small (the
"min{}" should be on the reclaimed bandwidth - so that we do not
reclaim too much - but this expression is computing the runtime
decrement - so I think this should be a "max{}").

Or am I misunderstanding something?

Did you try using u/Umax, but without changing the "max{}" into "min{}"?

> After fixing the above equation, reclaim logic works well but when
> only SCHED_FLAG_RECLAIM tasks are running. When we have a mix of both
> normal deadline and SCHED_FLAG_RECLAIM, it breaks the reclaim logic.
> As you pointed out, the second part of the fix is for that.

OK

> > If using div64_u64() does not introduce too much overhead, then I
> > agree with the first change.  
> In my testing, I did not see a change in the performance of the
> grub_reclaim function. Both old and new implementations take 10 to
> 20 nanoseconds on average. But my observation might not be accurate.

Or maybe my assumption that div64 is bad was wrong :)
Let's see what other people think about this.

			Thanks,
				Luca

> With this change, it is difficult to avoid division as the denominator
> is a variable and we would not be able to pre-calculate an inverse. We
> could probably calculate inverse during {__add/__sub}_running_bw so as
> to reduce the frequency of div64_u64. I shall try this for v2.
> 
> > The second change also looks good to me.
> >
> > I have no comments on the code, but there is one thing in the
> > comments that looks misleading to me (or I am misunderstanding the
> > code or the comments):
> >  
> 
> > > + * We can calculate Umax_reclaim as:
> > > + *   Umax_reclaim:   this_bw + Uinact + Ureclaim  
> >
> > I think this looks like a typo (summing this_bw to Uinact
> > looks wrong). Should "this_bw" be Uextra?
> >  
> Thanks a lot for pointing it out. Yes you are right, I messed up in
> the comments. It should be Uextra and I shall fix it in v2.
> 
> > > + *   dq = -(Ureclaim / Umax_reclaim) * dt
> > > + *      = -(Ureclaim / (Ureclaim + Uextra + Uinact)) * dt  
> >
> > I think this should be the correct equation. BTW, since you are
> > summing Uextra and Uinact, mabe you could just use "Umax -
> > running_bw"? 
> Makes sense, it will avoid an extra variable Uinact. I shall modify
> this in v2.
> 
> Thanks,
> Vineeth