On 04/30/20 20:21, Dietmar Eggemann wrote: > On 29/04/2020 14:30, Qais Yousef wrote: > > Hi Pavan > > > > On 04/29/20 17:02, Pavan Kondeti wrote: > >> Hi Qais, > >> > >> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote: > > [...] > > >>> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id) > >>> static inline struct uclamp_se > >>> uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id) > >>> { > >>> - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id); > >>> - struct uclamp_se uc_max = uclamp_default[clamp_id]; > >>> + struct uclamp_se uc_req, uc_max; > >>> + > >>> + /* > >>> + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value. > >>> + */ > >>> + uclamp_sync_util_min_rt_default(p); > >>> + > >>> + uc_req = uclamp_tg_restrict(p, clamp_id); > >>> + uc_max = uclamp_default[clamp_id]; > >> > >> We are calling uclamp_sync_util_min_rt_default() unnecessarily for > >> clamp_id == UCLAMP_MAX case. Would it be better to have a separate > > > > It was actually intentional to make sure we update the value ASAP. I didn't > > think it's a lot of overhead. I can further protect with a check to verify > > whether the value has changed if it seems heavy handed. > > Users of uclamp_eff_value()->uclamp_eff_get() ((like > rt_task_fits_capacity())) always call both ids. > > So calling uclamp_sync_util_min_rt_default() only for UCLAMP_MIN would > make sense. It's overhead in the fast path for rt tasks. > > Since changes to sched_util_clamp_min_rt_default will be fairly rare, > you might even want to consider only doing the uclamp_se_set(..., > min_rt_default, ...) in case > > uc_se->value != sysctl_sched_uclamp_util_min_rt_default Okay will do though I'm not convinced this micro optimization makes any difference. p->uclamp_req[] is already read, so it should be cache hot. uclamp_set_se() performs 3 writes on it, so I expect no stalls. Even if it was a write-through cache, there's usually a write buffer that I think should be able to hide the 3 writes. Write-through + linux is a bad idea in general. In contrary, the complex if condition might make branch prediction less effective, hence this micro optimization might create a negative effect. So I don't see this a clear win and it would be hard to know without proper measurement really. It is cheaper sometimes to always execute something than sprinkle more branches to avoid executing this simple code. I just realized though that I didn't define the uclamp_sync_util_min_rt_default() as inline, which I should to avoid a function call. Compiler *should* be smart though :p Thanks -- Qais Yousef