Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

Pavan Kondeti <pkondeti@xxxxxxxxxxxxxx> · Wed, 29 Apr 2020 20:51:06 +0530

Hi Qais,

On Wed, Apr 29, 2020 at 01:30:57PM +0100, Qais Yousef wrote:
> Hi Pavan
> 
> On 04/29/20 17:02, Pavan Kondeti wrote:
> > Hi Qais,
> > 
> > On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
> > 
> > [...]
> > 
> > >  
> > > +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> > > +{
> > > +	struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> > > +
> > > +	if (unlikely(rt_task(p)) && !uc_se->user_defined)
> > > +		uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> > > +}
> > 
> > Unlike system default clamp values, RT default value is written to
> > p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
> > lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
> > big deal. Just sharing my observation. Is this how you expected it to work?
> 
> This is how I expect it to work yes.
> 
> Note that the sysctl_sched_uclamp_util_{min,max}, or 'system default clamp
> values' as you called them, are constraints. They define the range any task is
> _allowed to_ *request*.
> 
> The new sysctl_sched_uclamp_util_min_rt_default is setting the default
> *request* for RT tasks. And this was done since historically RT tasks has
> always run at the highest performance point in Linux.
> 
> Can you think of a scenario where setting the default *request* of uclamp.max
> for all RT tasks helps?
> 

It was a hypothetical question :-) Thanks for clearly explaining the
difference between system default clamp values (constraints) vs the newly
introduced sysctl_sched_uclamp_util_min_rt_default (request for RT tasks).

> I'm thinking, the global constrain of sysctl_sched_uclamp_util_max should
> suffice. Or one can use cgroup to selectively cap some tasks.
> 
> For example if you don't want any task in the system to be boosted to anything
> higher than 800, just do
> 
> 	sysctl_sched_uclamp_util_max = 800
> 
> Or if you want to be selective, the same thing can be achieved by creating
> a cgroup cpu controller that has uclamp.max = 0.8 and attaching tasks to it.
> 
> > 
> > > +
> > >  static inline struct uclamp_se
> > >  uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > >  {
> > > @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > >  static inline struct uclamp_se
> > >  uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> > >  {
> > > -	struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> > > -	struct uclamp_se uc_max = uclamp_default[clamp_id];
> > > +	struct uclamp_se uc_req, uc_max;
> > > +
> > > +	/*
> > > +	 * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> > > +	 */
> > > +	uclamp_sync_util_min_rt_default(p);
> > > +
> > > +	uc_req = uclamp_tg_restrict(p, clamp_id);
> > > +	uc_max = uclamp_default[clamp_id];
> > 
> > We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> > clamp_id == UCLAMP_MAX case. Would it be better to have a separate
> 
> It was actually intentional to make sure we update the value ASAP. I didn't
> think it's a lot of overhead. I can further protect with a check to verify
> whether the value has changed if it seems heavy handed.
> 
> > uclamp_default for RT like uclamp_default_rt and select uc_max based
> > on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
> > we can cover the case of uclamp_util_min < uclamp_util_min_rt.
> 
> Hmm I'm not sure I got you.
> 
> uclamp_default[] is setting the constraints. I.e. what's the range it's allowed
> to take.
> 
> What's added here is simply setting the requested uclamp.min value, which if
> the constraints allow it will be applied.
> 
> For example. If
> 
> 	sysctl_sched_uclamp_util_min = 0
> 	sysctl_sched_uclamp_util_min_rt_default = 400
> 
> uclamp_eff_get() will return 0 as effective uclamp.min value for the task. So
> while the task still requests to be boosted to 1024, but the system constraint
> prevents it from getting it. But as soon as the system constraint has changed,
> the task might be able to get what it wants.
> 
> For example, if at runtime sysctl_sched_uclamp_util_min was modified to 1024
> 
> 	sysctl_sched_uclamp_util_min = 1024
> 	sysctl_sched_uclamp_util_min_rt_default = 400
> 
> uclamp_eff_get() return 400 for the task now, as requested.
> 

Yes, I did understand the relation between sysctl_sched_uclamp_util_min and
sysctl_sched_uclamp_util_min_rt_default . It is not clear why we are copying
sysctl_sched_uclamp_util_min_rt_default into p->uclamp_req[UCLAMP_MIN] though
user did not explicitly asked for it. In the above reply, you clearly
mentioned that the new knob works like a request from the user space for all
RT tasks.

As we are copying the sysctl_sched_uclamp_util_min_rt_default value into
p->uclamp_req[UCLAMP_MIN], user gets it when sched_getattr() is called though
sched_setattr() was not called before. I guess that is expected behavior with
your definition of this new tunable. Thanks for answering the question in
detail.

Thanks,
Pavan

> Thanks
> 
> --
> Qais Yousef
> 
> > 
> > >  
> > >  	/* System default restrictions always apply */
> > >  	if (unlikely(uc_req.value > uc_max.value))
> > > @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > >  				loff_t *ppos)
> > >  {
> > >  	bool update_root_tg = false;
> > > -	int old_min, old_max;
> > > +	int old_min, old_max, old_min_rt;
> > >  	int result;
> > >  
> > >  	mutex_lock(&uclamp_mutex);
> > >  	old_min = sysctl_sched_uclamp_util_min;
> > >  	old_max = sysctl_sched_uclamp_util_max;
> > > +	old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
> > >  
> > >  	result = proc_dointvec(table, write, buffer, lenp, ppos);
> > >  	if (result)
> > > @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > >  		goto undo;
> > >  	}
> > >  
> > > +	/*
> > > +	 * The new value will be applied to RT tasks the next time the
> > > +	 * scheduler needs to calculate the effective uclamp.min for that task,
> > > +	 * assuming the task is using the system default and not a user
> > > +	 * specified value. In the latter we shall leave the value as the user
> > > +	 * requested.
> > > +	 */
> > > +	if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> > > +		result = -EINVAL;
> > > +		goto undo;
> > > +	}
> > > +
> > >  	if (old_min != sysctl_sched_uclamp_util_min) {
> > >  		uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> > >  			      sysctl_sched_uclamp_util_min, false);
> > > @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > >  undo:
> > >  	sysctl_sched_uclamp_util_min = old_min;
> > >  	sysctl_sched_uclamp_util_max = old_max;
> > > +	sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> > >  done:
> > >  	mutex_unlock(&uclamp_mutex);
> > >  
> > > @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> > >  		if (uc_se->user_defined)
> > >  			continue;
> > >  
> > > -		/* By default, RT tasks always get 100% boost */
> > > +		/*
> > > +		 * By default, RT tasks always get 100% boost, which the admins
> > > +		 * are allowed to change via
> > > +		 * sysctl_sched_uclamp_util_min_rt_default knob.
> > > +		 */
> > >  		if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> > > -			clamp_value = uclamp_none(UCLAMP_MAX);
> > > +			clamp_value = sysctl_sched_uclamp_util_min_rt_default;
> > >  
> > >  		uclamp_se_set(uc_se, clamp_value, false);
> > >  	}
> > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > > index 8a176d8727a3..64117363c502 100644
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> > >  		.mode		= 0644,
> > >  		.proc_handler	= sysctl_sched_uclamp_handler,
> > >  	},
> > > +	{
> > > +		.procname	= "sched_util_clamp_min_rt_default",
> > > +		.data		= &sysctl_sched_uclamp_util_min_rt_default,
> > > +		.maxlen		= sizeof(unsigned int),
> > > +		.mode		= 0644,
> > > +		.proc_handler	= sysctl_sched_uclamp_handler,
> > > +	},
> > >  #endif
> > >  #ifdef CONFIG_SCHED_AUTOGROUP
> > >  	{
> > > -- 
> > > 2.17.1
> > > 
> > 
> > Thanks,
> > Pavan
> > 
> > -- 
> > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.