On Tue, 2024-08-06 at 20:11 +0900, Suleiman Souhlal wrote: > When steal time exceeds the measured delta when updating clock_task, we > currently try to catch up the excess in future updates. > However, this results in inaccurate run times for the future clock_task > measurements, as they end up getting additional steal time that did not > actually happen, from the previous excess steal time being paid back. > > For example, suppose a task in a VM runs for 10ms and had 15ms of steal > time reported while it ran. clock_task rightly doesn't advance. Then, a > different task runs on the same rq for 10ms without any time stolen. > Because of the current catch up mechanism, clock_sched inaccurately ends > up advancing by only 5ms instead of 10ms even though there wasn't any > actual time stolen. The second task is getting charged for less time > than it ran, even though it didn't deserve it. > In other words, tasks can end up getting more run time than they should > actually get. > > So, we instead don't make future updates pay back past excess stolen time. My understanding was that it was done this way for a reason: there is a lot of jitter between the "run time" (your 10ms example), and the steal time (15ms). What if 5ms really *did* elapse between the time that 'delta' is calculated, and the call to paravirt_steal_clock()? By accounting that steal time "in advance" we ensure it isn't lost in the case where the same process remains running for the next timeslice. However, that does cause problems when the steal time goes negative (due to hypervisor bugs). So in https://lore.kernel.org/all/20240522001817.619072-22-dwmw2@xxxxxxxxxxxxx/ I limited the amount of time which would be accounted to a future tick. > Signed-off-by: Suleiman Souhlal <suleiman@xxxxxxxxxx> > --- > kernel/sched/core.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index bcf2c4cc0522..42b37da2bda6 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -728,13 +728,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta) > #endif > #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING > if (static_key_false((¶virt_steal_rq_enabled))) { > - steal = paravirt_steal_clock(cpu_of(rq)); > + u64 prev_steal; > + > + steal = prev_steal = paravirt_steal_clock(cpu_of(rq)); > steal -= rq->prev_steal_time_rq; > > if (unlikely(steal > delta)) > steal = delta; > > - rq->prev_steal_time_rq += steal; > + rq->prev_steal_time_rq = prev_steal; > delta -= steal; > } > #endif
Attachment:
smime.p7s
Description: S/MIME cryptographic signature