Re: [PATCH] sched: Don't try to catch up excess steal time.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 06, 2024 at 06:51:36PM -0400, Joel Fernandes wrote:
> On Tue, Aug 6, 2024 at 7:13 AM Suleiman Souhlal <suleiman@xxxxxxxxxx> wrote:
> >
> > When steal time exceeds the measured delta when updating clock_task, we
> > currently try to catch up the excess in future updates.
> > However, this results in inaccurate run times for the future clock_task
> > measurements, as they end up getting additional steal time that did not
> > actually happen, from the previous excess steal time being paid back.
> >
> > For example, suppose a task in a VM runs for 10ms and had 15ms of steal
> > time reported while it ran. clock_task rightly doesn't advance. Then, a
> > different task runs on the same rq for 10ms without any time stolen.
> > Because of the current catch up mechanism, clock_sched inaccurately ends
> > up advancing by only 5ms instead of 10ms even though there wasn't any
> > actual time stolen. The second task is getting charged for less time
> > than it ran, even though it didn't deserve it.
> > In other words, tasks can end up getting more run time than they should
> > actually get.
> >
> > So, we instead don't make future updates pay back past excess stolen time.
> >
> > Signed-off-by: Suleiman Souhlal <suleiman@xxxxxxxxxx>
> > ---
> >  kernel/sched/core.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index bcf2c4cc0522..42b37da2bda6 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -728,13 +728,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
> >  #endif
> >  #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> >         if (static_key_false((&paravirt_steal_rq_enabled))) {
> > -               steal = paravirt_steal_clock(cpu_of(rq));
> > +               u64 prev_steal;
> > +
> > +               steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
> >                 steal -= rq->prev_steal_time_rq;
> >
> >                 if (unlikely(steal > delta))
> >                         steal = delta;
> >
> > -               rq->prev_steal_time_rq += steal;
> > +               rq->prev_steal_time_rq = prev_steal;
> >                 delta -= steal;
> 
> Makes sense, but wouldn't this patch also do the following: If vCPU
> task is the only one running and has a large steal time, then
> sched_tick() will only freeze the clock for a shorter period, and not
> give future credits to the vCPU task itself?  Maybe it does not matter
> (and I probably don't understand the code enough) but thought I would
> mention.

The patch should still be doing the right thing in that situation:
The clock will be frozen for the whole duration that it ran, and delta
will be 0.
The current excess amount is not relevant to the future, as far as I can
tell.
The pre-patch code is giving the rq extra time that it hadn't measured.
I don't really see why it should be getting that extra time.

> 
> I am also not sure if the purpose of stealtime is to credit individual
> tasks, or rather all tasks on the runqueue because the "whole
> runqueue" had time stolen.. No where in this function is it dealing
> with individual tasks but rather the rq itself.

This function is used to update clock_task, which *is* relevant to
individual tasks. It is used to calculate how long tasks ran for (and
for load averages).

-- Suleiman




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux