On 14-Jan 21:34, Qais Yousef wrote: > On 01/09/20 10:21, Patrick Bellasi wrote: > > That's not entirely true. In that patch we introduce cgroup support > > but, if you look at the code before that patch, for CFS tasks there is > > only: > > - CFS task-specific values (min,max)=(0,1024) by default > > - CFS system-wide tunables (min,max)=(1024,1024) by default > > and a change on the system-wide tunable allows for example to enforce > > a uclamp_max=200 on all tasks. > > > > A similar solution can be implemented for RT tasks, where we have: > > - RT task-specific values (min,max)=(1024,1024) by default > > - RT system-wide tunables (min,max)=(1024,1024) by default > > and a change on the system-wide tunable allows for example to enforce > > a uclamp_min=200 on all tasks. > > I feel I'm already getting lost in the complexity of the interaction here. Do > we really need to go that path? > > So we will end up with a default system wide for all tasks + a CFS system wide > default + an RT system wide default? > > As I understand it, we have a single system wide default now. Right now we have one system wide default and that's both for all CFS/RT tasks, when cgroups are not in use, or for root group and autogroup CFS/RT tasks, when cgroups are in use. What I'm proposing is to limit the usage of the current system wide default to CFS tasks only, while we add a similar new one specifically for RT tasks. At the end we will have two system wide defaults, not three. > > > (Would we need CONFIG_RT_GROUP_SCHED for this? IIRC there's a few pain points > > > when turning it on, but I think we don't have to if we just want things like > > > uclamp value propagation?) > > > > No, the current design for CFS tasks works also on !CONFIG_CFS_GROUP_SCHED. > > That's because in this case: > > - uclamp_tg_restrict() returns just the task requested value > > - uclamp_eff_get() _always_ restricts the requested value considering > > the system defaults > > > > > It's quite more work than the simple thing Qais is introducing (and on both > > > user and kernel side). > > > > But if in the future we will want to extend CGroups support to RT then > > we will feel the pains because we do the effective computation in two > > different places. > > Hmm what you're suggesting here is that we want to have > cpu.rt.uclamp.{min,max}? I'm not sure I can agree this is a good idea. That's exactly what we already do for other controllers. For example, if you look at the bandwidth controller, we have separate knobs for CFS and RT tasks. > It makes more sense to create a special group for all rt tasks rather than > treat rt tasks in a cgroup differently. Don't see why that should make more sense. This can work of course but it would enforce a more strict configuration and usage of cgroups to userspace. I also have some doubths about this approach matching the delegation model principles. > > Do note that a proper CGroup support requires that the system default > > values defines the values for the root group and are consistently > > propagated down the hierarchy. Thus we need to add a dedicated pair of > > cgroup attributes, e.g. cpu.util.rt.{min.max}. > > > > To recap, we don't need CGROUP support right now but just to add a new > > default tracking similar to what we do for CFS. > > > > We already proposed such a support in one of the initial versions of > > the uclamp series: > > Message-ID: <20190115101513.2822-10-patrick.bellasi@xxxxxxx> > > https://lore.kernel.org/lkml/20190115101513.2822-10-patrick.bellasi@xxxxxxx/ > > IIUC what you're suggesting is: > > 1. Use the sysctl to specify the default_rt_uclamp_min > 2. Enforce this value in uclamp_eff_get() rather than my sync logic > 3. Remove the current hack to always set > rt_task->uclamp_min = uclamp_none(UCLAMP_MAX) Right, that's the essence... > If I got it correctly I'd be happy to experiment with it if this is what > you're suggesting. Otherwise I'm afraid I'm failing to see the crust of the > problem you're trying to highlight. ... from what your write above I think you got it right. In my previous posting: Message-ID: <20200109092137.GA2811@darkstar> https://lore.kernel.org/lkml/20200109092137.GA2811@darkstar/ there is also the code snippet which should be good enough to extend uclamp_eff_get(). Apart from that, what remains is: - to add the two new sysfs knobs for sysctl_sched_uclamp_util_{min,max}_rt - make a call about how rt tasks in cgroups are clamped, a simple proposal is in the second snippet of my message above. Best, Patrick -- #include <best/regards.h> Patrick Bellasi