Re: [PATCH v6 05/16] sched/core: uclamp: Update CPU's refcount on clamp changes

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 22 Jan 2019 10:37:04 +0100

On Mon, Jan 21, 2019 at 03:44:12PM +0000, Patrick Bellasi wrote:
> On 21-Jan 16:33, Peter Zijlstra wrote:
> > On Tue, Jan 15, 2019 at 10:15:02AM +0000, Patrick Bellasi wrote:
> > 
> > > +static inline void
> > > +uclamp_task_update_active(struct task_struct *p, unsigned int clamp_id)
> > > +{
> > > +	struct rq_flags rf;
> > > +	struct rq *rq;
> > > +
> > > +	/*
> > > +	 * Lock the task and the CPU where the task is (or was) queued.
> > > +	 *
> > > +	 * We might lock the (previous) rq of a !RUNNABLE task, but that's the
> > > +	 * price to pay to safely serialize util_{min,max} updates with
> > > +	 * enqueues, dequeues and migration operations.
> > > +	 * This is the same locking schema used by __set_cpus_allowed_ptr().
> > > +	 */
> > > +	rq = task_rq_lock(p, &rf);
> > > +
> > > +	/*
> > > +	 * Setting the clamp bucket is serialized by task_rq_lock().
> > > +	 * If the task is not yet RUNNABLE and its task_struct is not
> > > +	 * affecting a valid clamp bucket, the next time it's enqueued,
> > > +	 * it will already see the updated clamp bucket value.
> > > +	 */
> > > +	if (!p->uclamp[clamp_id].active)
> > > +		goto done;
> > > +
> > > +	uclamp_cpu_dec_id(p, rq, clamp_id);
> > > +	uclamp_cpu_inc_id(p, rq, clamp_id);
> > > +
> > > +done:
> > > +	task_rq_unlock(rq, p, &rf);
> > > +}
> > 
> > > @@ -1008,11 +1043,11 @@ static int __setscheduler_uclamp(struct task_struct *p,
> > >  
> > >  	mutex_lock(&uclamp_mutex);
> > >  	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) {
> > > -		uclamp_bucket_inc(&p->uclamp[UCLAMP_MIN],
> > > +		uclamp_bucket_inc(p, &p->uclamp[UCLAMP_MIN],
> > >  				  UCLAMP_MIN, lower_bound);
> > >  	}
> > >  	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX) {
> > > -		uclamp_bucket_inc(&p->uclamp[UCLAMP_MAX],
> > > +		uclamp_bucket_inc(p, &p->uclamp[UCLAMP_MAX],
> > >  				  UCLAMP_MAX, upper_bound);
> > >  	}
> > >  	mutex_unlock(&uclamp_mutex);
> > 
> > 
> > But.... __sched_setscheduler() actually does the whole dequeue + enqueue
> > thing already ?!? See where it does __setscheduler().
> 
> This is slow-path accounting, not fast path.

Sure; but that's still no reason for duplicate or unneeded code.

> There are two refcounting going on here:
> 
>   1) mapped buckets:
> 
>      clamp_value <--(M1)--> bucket_id
> 
>   2) RUNNABLE tasks:
> 
>      bucket_id <--(M2)--> RUNNABLE tasks in a bucket
> 
> What we fix here is the refcounting for the buckets mapping. If a task
> does not have a task specific clamp value it does not refcount any
> bucket. The moment we assign a task specific clamp value, we need to
> refcount the task in the bucket corresponding to that clamp value.
> 
> This will keep the bucket in use at least as long as the task will
> need that clamp value.

Sure, I get that. What I don't get is why you're adding that (2) here.
Like said, __sched_setscheduler() already does a dequeue/enqueue under
rq->lock, which should already take care of that.