On 07/10/2018 11:23 AM, Waiman Long wrote: > On 07/06/2018 04:32 PM, Waiman Long wrote: >> On 07/03/2018 11:58 AM, Tejun Heo wrote: >>> Hello, Waiman. >>> >>> On Tue, Jul 03, 2018 at 08:41:31AM +0800, Waiman Long wrote: >>>>> So, effective changing when enabling partition on a child feels wrong >>>>> to me. It's supposed to contain what's actually allowed to the cgroup >>>>> from its parent and that shouldn't change regardless of how those >>>>> resources are used. It's still given to the cgroup from its parent. >>>> Another way to work around this issue is to expose the reserved_cpus in >>>> the parent for holding CPUs that can taken by a chid partition. That >>>> will require adding one more cpuset file for those cgroups that are >>>> partition roots. >>> Yeah, that should work. >>> >> Thinking about it a bit more, that approach will make creating a >> partition a multi-step process: >> >> 1) Reserve the CPUs in reserved_cpus. >> 2) enable sched.partition >> 3) Write the CPUs list into cpus. >> >> There are also more exception cases that need to be handled. The current >> approach, on the other hands, is much simpler and easier to understand >> and use. >> >>>> I don't mind restricting that to the first level children for now. That >>>> does restrict where we can put the container root if we want a separate >>>> partition for a container. Let's hear if others have any objection about >>>> that. >>> As currently implemented, partioning locks away the cpus which should >>> be a system level decision, not container level, so it makes sense to >>> me that it is only available to system root. >> So my preference is to allow partition only on the first level children >> of the root for the time being. I think it should cover most of the use >> cases. I will update the patchset to reflect that. >> >> Cheers, >> Longman >> > Below is the incremental patch that allow partitioning only on the first > level children of the root. Please let me know your thourght on that. > > Thanks, > Longman > > -------------[ Cut here ]------------------------- > > From 5a41209da94385efff87d79f6523265c710cbea5 Mon Sep 17 00:00:00 2001 > From: Waiman Long <longman@xxxxxxxxxx> > Date: Tue, 10 Jul 2018 10:23:16 -0400 > Subject: [PATCH v11 10/10] cpuset: Restrict sched.partition to first level > children of root only > > Enabling partition on a v2 cpuset has the side effect of affecting the > effective CPUs of its parent which is currently unique to partitioning. > As we are not sure about the repercussion of enabling that globally, > we are now restricting the enabling of sched.partition to the first > level children of the root cgroup in the default hierarchy. > > This is done by removing the "cpuset.sched.partition" control file > on cgroups that are not the first level children of the root. A new > show_cfile function pointer is added to the cftype structure. If it > is defined, it will be called to return a boolean value to determine > if the corresponding control file should show up in the cgroup. It > provides a more flexible mechanism to determine the visibility of the > control file than a simple CFTYPE_* flag can do. > > Signed-off-by: Waiman Long <longman@xxxxxxxxxx> > --- > Documentation/admin-guide/cgroup-v2.rst | 10 +++++----- > include/linux/cgroup-defs.h | 9 +++++++++ > kernel/cgroup/cgroup.c | 4 ++++ > kernel/cgroup/cpuset.c | 10 ++++++++++ > 4 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst > b/Documentation/admin-guide/cgroup-v2.rst > index f7cde15..cf7cd88 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1585,10 +1585,10 @@ Cpuset Interface Files > Its value will be affected by memory nodes hotplug events. > > cpuset.sched.partition > - A read-write single value file which exists on non-root > - cpuset-enabled cgroups. It is a binary value flag that accepts > - either "0" (off) or "1" (on). This flag is set and owned by the > - parent cgroup. > + A read-write single value file which exists on the first level > + children of the root cgroup. It is a binary value flag that > + accepts either "0" (off) or "1" (on). This flag is set and > + owned by the parent cgroup. > > If set, it indicates that the current cgroup is the root of a > new partition or scheduling domain that comprises itself and > @@ -1603,7 +1603,7 @@ Cpuset Interface Files > exclusive, i.e. they are not shared by any of its siblings. > 2) The "cpuset.cpus" is also a proper subset of the parent's > "cpuset.cpus.effective". > - 3) The parent cgroup is a partition root. > + 3) The parent cgroup is the root cgroup. > 4) There is no child cgroups with cpuset enabled. This is for > eliminating corner cases that have to be handled if such a > condition is allowed. > diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h > index c0e68f9..be79487 100644 > --- a/include/linux/cgroup-defs.h > +++ b/include/linux/cgroup-defs.h > @@ -565,6 +565,15 @@ struct cftype { > ssize_t (*write)(struct kernfs_open_file *of, > char *buf, size_t nbytes, loff_t off); > > + /* > + * show_cfile(), if defined, will return a boolean value to > + * determine if the control file should show up in the cgroup. > + * It provides more flexibility in deciding where the control > + * file should appear than simple criteria like on-root or > + * not-on-root. > + */ > + bool (*show_cfile)(struct cgroup_subsys_state *css); > + > #ifdef CONFIG_DEBUG_LOCK_ALLOC > struct lock_class_key lockdep_key; > #endif > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index 077370b..0afdab8 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -3612,6 +3612,10 @@ static int cgroup_addrm_files(struct > cgroup_subsys_state *css, > if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgroup_parent(cgrp)) > continue; > > + /* Should the control file show up in the cgroup */ > + if (cft->show_cfile && !cft->show_cfile(css)) > + continue; > + > if (is_add) { > ret = cgroup_add_file(css, cgrp, cft); > if (ret) { > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 62b7e61..2f85c1e 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -2106,6 +2106,15 @@ static s64 cpuset_read_s64(struct > cgroup_subsys_state *css, struct cftype *cft) > } > > /* > + * The sched.partition control file should only show up in the first > + * level children of the root cgroup. > + */ > +static bool cpuset_show_partition(struct cgroup_subsys_state *css) > +{ > + return parent_cs(css_cs(css)) == &top_cpuset; > +} > + > +/* > * for the common functions, 'private' gives the type of file > */ > > @@ -2250,6 +2259,7 @@ static s64 cpuset_read_s64(struct > cgroup_subsys_state *css, struct cftype *cft) > .name = "sched.partition", > .read_u64 = cpuset_read_u64, > .write_u64 = cpuset_write_u64, > + .show_cfile = cpuset_show_partition, > .private = FILE_PARTITION_ROOT, > .flags = CFTYPE_NOT_ON_ROOT, > }, Tejun, What do you think about this patch? Are there any other issues you have with this patchset? Thanks, Longman -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html