On Fri, Oct 22, 2010 at 04:09:56PM +0800, Li Zefan wrote: > Stephane posted a patchset to add perf_cgroup subsystem, so perf can > be used to monitor all threads belonging to a cgroup. > > But if you already mounted a cgroup hierarchy but without perf_cgroup > and the hierarchy has sub-cgroups, you can't bind perf_cgroup to it, > and thus you're not able to use per-cgroup perf feature. > > This patchset alleviates the pain, and then a subsytem can be bind/unbind > to/from a hierarchy which has sub-cgroups in it. > > For a cgroup subsystem to become bindable, the can_bind flag of > struct cgroup_subsys should be set, and provide ->bind() callback > if necessary. > > But for some constraints, not all subsystems can take advantage of > this patch. For example, we can't decide a cgroup's cpuset.mems and > cpuset.cpus automatically, so cpuset is not bindable. <snip> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 6c36750..46df5f8 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c <snip> > +/* > + * cgroup_walk_herarchy - iterate through a cgroup hierarchy > + * @process_cgroup: callback called on each cgroup in the hierarchy > + * @data: will be passed to @process_cgroup > + * @top_cgrp: the root cgroup of the hierarchy > + * > + * For such a hierarchy: > + * a1 c1 > + * / / > + * Root - a2 - b1 - c2 > + * \ > + * a3 > + * > + * The iterating order is: a1, a2, b1, c1, c2, a3. So a parent will be > + * processed before its children. > + */ You could just say it's a depth-first walk except we process the parent before its children. > +static int cgroup_walk_hierarchy(int (*process_cgroup)(struct cgroup *, void *), > + void *data, struct cgroup *top_cgrp) <snip> > +static int hierarchy_populate_dir(struct cgroup *cgrp, void *data) > +{ > + mutex_lock_nested(&cgrp->dentry->d_inode->i_mutex, I_MUTEX_CHILD); > + cgroup_populate_dir(cgrp); > + mutex_unlock(&cgrp->dentry->d_inode->i_mutex); > + return 0; > +} > + > /* > * Call with cgroup_mutex held. Drops reference counts on modules, including > * any duplicate ones that parse_cgroupfs_options took. If this function > @@ -945,36 +1079,53 @@ static int rebind_subsystems(struct cgroupfs_root *root, > unsigned long added_bits, removed_bits; > struct cgroup *cgrp = &root->top_cgroup; > int i; > + int err; > > BUG_ON(!mutex_is_locked(&cgroup_mutex)); > > removed_bits = root->actual_subsys_bits & ~final_bits; > added_bits = final_bits & ~root->actual_subsys_bits; > + > /* Check that any added subsystems are currently free */ > - for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { > - unsigned long bit = 1UL << i; > - struct cgroup_subsys *ss = subsys[i]; > - if (!(bit & added_bits)) > - continue; > + for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) { > /* > * Nobody should tell us to do a subsys that doesn't exist: > * parse_cgroupfs_options should catch that case and refcounts > * ensure that subsystems won't disappear once selected. > */ > - BUG_ON(ss == NULL); > - if (ss->root != &rootnode) { > + BUG_ON(subsys[i] == NULL); > + if (subsys[i]->root != &rootnode) { > /* Subsystem isn't free */ > return -EBUSY; > } > } > > - /* Currently we don't handle adding/removing subsystems when > - * any child cgroups exist. This is theoretically supportable > - * but involves complex error handling, so it's being left until > - * later */ > - if (root->number_of_cgroups > 1) > + /* removing will be supported later */ > + if (root->number_of_cgroups > 1 && removed_bits) > return -EBUSY; > > + if (root->number_of_cgroups > 1) { Is there something wrong with the indentation here? I can't see the closing brace for the "if (root->number_of_cgroups > 1)" that should precede the for_each_set_bit() loop below. > + for_each_set_bit(i, &added_bits, CGROUP_SUBSYS_COUNT) > + if (!subsys[i]->can_bind) > + return -EBUSY; I think you could avoid the can_bind flag field entirely and do: if (!subsys[i]->bind) Also, if we're going with my "split out unbind" suggestion I think the part here would be: for_each_set_bit(i, &removed_bits, CGROUP_SUBSYS_COUNT) if (!subsys[i]->unbind) return -EBUSY; <snip> Cheers, -Matt Helsley _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers