Re: [PATCH] sched/fair: Add missing unthrottle_cfs_rq()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Valentin Schneider <valentin.schneider@xxxxxxx> writes:

> Turns out a cfs_rq->runtime_remaining can become positive in
> assign_cfs_rq_runtime(), but this codepath has no call to
> unthrottle_cfs_rq().
>
> This can leave us in a situation where we have a throttled cfs_rq with
> positive ->runtime_remaining, which breaks the math in
> distribute_cfs_runtime(): this function expects a negative value so that
> it may safely negate it into a positive value.
>
> Add the missing unthrottle_cfs_rq(). While at it, add a WARN_ON where
> we expect negative values, and pull in a comment from the mailing list
> that didn't make it in [1].
>
> [1]: https://lkml.kernel.org/r/BANLkTi=NmCxKX6EbDQcJYDJ5kKyG2N1ssw@xxxxxxxxxxxxxx
>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Fixes: ec12cb7f31e2 ("sched: Accumulate per-cfs_rq cpu usage and charge against bandwidth")
> Reported-by: Liangyan <liangyan.peng@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Valentin Schneider <valentin.schneider@xxxxxxx>

Having now seen the rest of the thread:

Could you send the repro, as it doesn't seem to have reached lkml, so
that I can confirm my guess as to what's going on?

It seems most likely we throttle during one of the remove-change-adds in
set_cpus_allowed and friends or during the put half of pick_next_task
followed by idle balance to drop the lock. Then distribute races with a
later assign_cfs_rq_runtime so that the account finds runtime in the
cfs_b.

Re clock_task, it's only frozen for the purposes of pelt, not delta_exec

The other possible way to fix this would be to skip assign if throttled,
since the only time it could succeed is if we're racing with a
distribute that will unthrottle use anyways.

The main advantage of that is the risk of screwy behavior due to unthrottling
in the middle of pick_next/put_prev. The disadvantage is that we already
have the lock, if it works we don't need an ipi to trigger a preempt,
etc. (But I think one of the issues is that we may trigger the preempt
on the previous task, not the next, and I'm not 100% sure that will
carry over correctly)



> ---
>  kernel/sched/fair.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1054d2cf6aaa..219ff3f328e5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4385,6 +4385,11 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
>  	return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
>  }
>  
> +static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
> +{
> +	return cfs_bandwidth_used() && cfs_rq->throttled;
> +}
> +
>  /* returns 0 on failure to allocate runtime */
>  static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>  {
> @@ -4411,6 +4416,9 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq)
>  
>  	cfs_rq->runtime_remaining += amount;
>  
> +	if (cfs_rq->runtime_remaining > 0 && cfs_rq_throttled(cfs_rq))
> +		unthrottle_cfs_rq(cfs_rq);
> +
>  	return cfs_rq->runtime_remaining > 0;
>  }
>  
> @@ -4439,11 +4447,6 @@ void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec)
>  	__account_cfs_rq_runtime(cfs_rq, delta_exec);
>  }
>  
> -static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq)
> -{
> -	return cfs_bandwidth_used() && cfs_rq->throttled;
> -}
> -
>  /* check whether cfs_rq, or any parent, is throttled */
>  static inline int throttled_hierarchy(struct cfs_rq *cfs_rq)
>  {
> @@ -4628,6 +4631,10 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth *cfs_b, u64 remaining)
>  		if (!cfs_rq_throttled(cfs_rq))
>  			goto next;
>  
> +		/* By the above check, this should never be true */
> +		WARN_ON(cfs_rq->runtime_remaining > 0);
> +
> +		/* Pick the minimum amount to return to a positive quota state */
>  		runtime = -cfs_rq->runtime_remaining + 1;
>  		if (runtime > remaining)
>  			runtime = remaining;



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux