Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > On Tue, Aug 20, 2019 at 11:54:20AM +0100, Valentin Schneider wrote: >> Turns out a cfs_rq->runtime_remaining can become positive in >> assign_cfs_rq_runtime(), but this codepath has no call to >> unthrottle_cfs_rq(). >> >> This can leave us in a situation where we have a throttled cfs_rq with >> positive ->runtime_remaining, which breaks the math in >> distribute_cfs_runtime(): this function expects a negative value so that >> it may safely negate it into a positive value. >> >> Add the missing unthrottle_cfs_rq(). While at it, add a WARN_ON where >> we expect negative values, and pull in a comment from the mailing list >> that didn't make it in [1]. This didn't exist because it's not supposed to be possible to call account_cfs_rq_runtime on a throttled cfs_rq at all, so that's the invariant being violated. Do you know what the code path causing this looks like? This would allow both list del and add while distribute is doing a foreach, but I think that the racing behavior would just be to restart the distribute loop, which is fine. >> >> [1]: https://lkml.kernel.org/r/BANLkTi=NmCxKX6EbDQcJYDJ5kKyG2N1ssw@xxxxxxxxxxxxxx >> >> Cc: <stable@xxxxxxxxxxxxxxx> >> Fixes: ec12cb7f31e2 ("sched: Accumulate per-cfs_rq cpu usage and charge against bandwidth") >> Reported-by: Liangyan <liangyan.peng@xxxxxxxxxxxxxxxxx> >> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx> > > Thanks! > >> --- >> kernel/sched/fair.c | 17 ++++++++++++----- >> 1 file changed, 12 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 1054d2cf6aaa..219ff3f328e5 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -4385,6 +4385,11 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq) >> return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time; >> } >> >> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) >> +{ >> + return cfs_bandwidth_used() && cfs_rq->throttled; >> +} >> + >> /* returns 0 on failure to allocate runtime */ >> static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) >> { >> @@ -4411,6 +4416,9 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) >> >> cfs_rq->runtime_remaining += amount; >> >> + if (cfs_rq->runtime_remaining > 0 && cfs_rq_throttled(cfs_rq)) >> + unthrottle_cfs_rq(cfs_rq); >> + >> return cfs_rq->runtime_remaining > 0; >> } >> >> @@ -4439,11 +4447,6 @@ void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) >> __account_cfs_rq_runtime(cfs_rq, delta_exec); >> } >> >> -static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) >> -{ >> - return cfs_bandwidth_used() && cfs_rq->throttled; >> -} >> - >> /* check whether cfs_rq, or any parent, is throttled */ >> static inline int throttled_hierarchy(struct cfs_rq *cfs_rq) >> { >> @@ -4628,6 +4631,10 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining) >> if (!cfs_rq_throttled(cfs_rq)) >> goto next; >> >> + /* By the above check, this should never be true */ >> + WARN_ON(cfs_rq->runtime_remaining > 0); >> + >> + /* Pick the minimum amount to return to a positive quota state */ >> runtime = -cfs_rq->runtime_remaining + 1; >> if (runtime > remaining) >> runtime = remaining; >> -- >> 2.22.0 >>