Hi Joel, On Thu, May 11, 2023 at 2:19 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote: > > Hi Vineeth, > Nice work. ;-) Few quick comments below: > Thanks :-) > On Wed, May 10, 2023 at 6:46 PM Vineeth Pillai <vineeth@xxxxxxxxxxxxxxx> wrote: > > This patch fixes the issue by appropriatley caping the max allowed > > utilization and also slightly adjusting GRUB algorithm to account > > for a mix of normal deadline and SCHED_FLAG_RECLAIM tasks. > > Looked at the patch quickly as I am due for bedtime ;-). I feel like > this does more than should be done in one patch. So you should > consider splitting it IMHO. I think the explanatory comments are what makes this patch look huge. The code changes are pretty small and simple: - track bw of SCHED_RECLAIM_TASKS - modify the reclamation equation. > > According to the GRUB rule, the runtime is depreciated as a factor > > of active bandwidth of the runqueue: "dq = -dt", where U is the > > Saying "dq = -dt" where U does not make sense because U is not in the > equation ;-). Suggest rephrase. > Sorry about this, I will rephrase it. > > > active bandwidth. Also, we do not allocate the full bandwidth of a > > cpu to deadline task, but only a portion(Umax) to it, so as to avoid > > deadline tasks starving lower class tasks. The equation could be > > re-written as "dq = -(U / Umax) * dt" > > Isn't the equation in the code right now as: > dq = -max{ Ui / Umax, (1 - Uinact - Uextra) } dt > > ? > > That's what the kernel docs say [1]. > > So what do you mean by "could be re-written" ? > > [1] https://docs.kernel.org/scheduler/sched-deadline.html > This patch uses a different equation than [1] and updated the kernel doc as well in patch 2 of this series. I understand "re-written" is confusing and will rephrase it. > > Following are the results with this patch: > > > > RUN 1: runtime=7ms, deadline=period=10ms, RT capacity = 95% > > TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 94.98 > > TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.04 > > TID[616]: RECLAIM=1, (r=7ms, d=10ms, p=10ms), Util: 95.01 > > All these look 100% correct to me. Do these tasks start at the same > time or are they shifted in their respective activations? Just wanted > to be sure it behaves the same either way... > I have tested both. Actually I tested 3 scenarios: - Started the threads at the same time - Started the threads separately (5-10 seconds apart) - Started the threads at the same time with some of them sleeping for a while before spinning. This was to see if transition from NonContending to Inactive was showing any issues with this patch. > > +static inline bool dl_entity_is_reclaim(const struct sched_dl_entity *dl_se) > > +{ > > + return dl_se->flags & SCHED_FLAG_RECLAIM; > > +} > > + > > Can this helper addition be split out to a different patch? > I feel it could go with this patch as the code changes in this patch are fairly small and the function is trivial. But I can split it if you feel that this patch is really huge and needs splitting. > > + * Maximum available bandwidth for this runqueue. This is used to > > + * calculate reclaimable bandwidth for SCHED_FLAG_RECLAIM tasks. > > + * By restricting maximum usable bandwidth, we aim to give other > > + * tasks on lower classes a chance to run, when competing with > > + * SCHED_FLAG_RECLAIM tasks. > > */ > > - u64 bw_ratio; > > + u64 max_bw; > > + > > + /* > > + * Active bandwidth of SCHED_FLAG_RECLAIM tasks on this rq. > > + * This will be a subset of running_bw. > > + */ > > + u64 reclaim_bw; > > + > > And perhaps addition and use of these new fields if it makes sense. > > I will take a closer look at your patches later or after v2.. > Thanks :-) Vineeth