Patch "sched: Don't try to catch up excess steal time." has been added to the 6.13-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    sched: Don't try to catch up excess steal time.

to the 6.13-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     sched-don-t-try-to-catch-up-excess-steal-time.patch
and it can be found in the queue-6.13 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 652f56138b311888f13dbd13435a40ae75b55fd4
Author: Suleiman Souhlal <suleiman@xxxxxxxxxx>
Date:   Mon Nov 18 13:37:45 2024 +0900

    sched: Don't try to catch up excess steal time.
    
    [ Upstream commit 108ad0999085df2366dd9ef437573955cb3f5586 ]
    
    When steal time exceeds the measured delta when updating clock_task, we
    currently try to catch up the excess in future updates.
    However, this results in inaccurate run times for the future things using
    clock_task, in some situations, as they end up getting additional steal
    time that did not actually happen.
    This is because there is a window between reading the elapsed time in
    update_rq_clock() and sampling the steal time in update_rq_clock_task().
    If the VCPU gets preempted between those two points, any additional
    steal time is accounted to the outgoing task even though the calculated
    delta did not actually contain any of that "stolen" time.
    When this race happens, we can end up with steal time that exceeds the
    calculated delta, and the previous code would try to catch up that excess
    steal time in future clock updates, which is given to the next,
    incoming task, even though it did not actually have any time stolen.
    
    This behavior is particularly bad when steal time can be very long,
    which we've seen when trying to extend steal time to contain the duration
    that the host was suspended [0]. When this happens, clock_task stays
    frozen, during which the running task stays running for the whole
    duration, since its run time doesn't increase.
    However the race can happen even under normal operation.
    
    Ideally we would read the elapsed cpu time and the steal time atomically,
    to prevent this race from happening in the first place, but doing so
    is non-trivial.
    
    Since the time between those two points isn't otherwise accounted anywhere,
    neither to the outgoing task nor the incoming task (because the "end of
    outgoing task" and "start of incoming task" timestamps are the same),
    I would argue that the right thing to do is to simply drop any excess steal
    time, in order to prevent these issues.
    
    [0] https://lore.kernel.org/kvm/20240820043543.837914-1-suleiman@xxxxxxxxxx/
    
    Signed-off-by: Suleiman Souhlal <suleiman@xxxxxxxxxx>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/20241118043745.1857272-1-suleiman@xxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e0fd8069c60e6..ffceb5ff4c5c3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -766,13 +766,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 #endif
 #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
 	if (static_key_false((&paravirt_steal_rq_enabled))) {
-		steal = paravirt_steal_clock(cpu_of(rq));
+		u64 prev_steal;
+
+		steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
 		steal -= rq->prev_steal_time_rq;
 
 		if (unlikely(steal > delta))
 			steal = delta;
 
-		rq->prev_steal_time_rq += steal;
+		rq->prev_steal_time_rq = prev_steal;
 		delta -= steal;
 	}
 #endif




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux