(cc'ing Li and quoting the whole message) On Mon, May 22, 2017 at 06:14:31PM -0400, Daniel Jordan wrote: > Hi, > > I found what looks like a race between cgroup core and the cpuset > controller. > > --- > > /* Mount the cpuset and create parent and child cpusets. */ > # mount -t cpuset nodev /dev/cpuset > # cd /dev/cpuset > # mkdir parent > # mkdir parent/child > > /* Enable cpu_exclusive in both parent and child cpusets. */ > # cd parent > # /bin/echo 1 > cpu_exclusive > # /bin/echo 1 > child/cpu_exclusive > > /* Remove the child cpuset and then immediately try making the parent > non-exclusive. */ > # rmdir child; /bin/echo 0 > cpu_exclusive > /bin/echo: write error: Device or resource busy > > --- > > I'd expect the last command above to succeed. > > If I do the same steps as above, but make the last command > > # rmdir child; sleep 1; /bin/echo 0 > cpu_exclusive > > then it works. > > None of the three EBUSY errors from 'man cpuset' apply to this case, so I > added some debug output that shows what's going on: > > [2710738.469049] cgroup: [cpu 64] entering kill_css > [2710738.478379] cgroup: [cpu 64] leaving kill_css > [2710738.487659] [cpu 96] entering is_cpuset_subset > [2710738.496830] [cpu 96] triggered in is_cpuset_subset /* > is_cpu_exclusive(p) > is_cpu_exclusive(q) */ > [2710738.513153] cgroup: [cpu 64] entering css_killed_ref_fn > [2710738.523873] cgroup: [cpu 64] leaving css_killed_ref_fn > [2710738.534716] cgroup: [cpu 64] entering css_killed_work_fn > [2710738.545737] cgroup: [cpu 64] entering offline_css > [2710738.555644] [cpu 64] entering cpuset_css_offline > [2710738.565387] [cpu 64] entering is_cpuset_subset > [2710738.574744] [cpu 64] leaving cpuset_css_offline > [2710738.584297] cgroup: [cpu 64] leaving offline_css > [2710738.594010] cgroup: [cpu 64] leaving css_killed_work_fn > > It looks like the task on cpu 64 is kicking off the kworker that eventually > calls cpuset_css_offline, but that this worker doesn't get started until > after the "/bin/echo 0 > cpu_exclusive" command on cpu 96 is finished > running. That means that the check in is_cpuset_subset that verifies > whether all child css's are non-exclusive before allowing 'parent' to be > made exclusive fails because 'child' isn't killed yet. > > Is this expected behavior? > > Should the user have to sleep between commands or thoroughly clean up after > a cpuset with something like > > # /bin/echo 0 > child/cpu_exclusive; rmdir child; /bin/echo 0 > > cpu_exclusive > > to avoid these kinds of failures? Can you please see whether the following patch fixes the issue? Thanks. diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index f6501f4f6040..9e29dba49d6c 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -176,9 +176,10 @@ typedef enum { } cpuset_flagbits_t; /* convenient tests for these bits */ -static inline bool is_cpuset_online(const struct cpuset *cs) +static inline bool is_cpuset_online(struct cpuset *cs) { - return test_bit(CS_ONLINE, &cs->flags); + return test_bit(CS_ONLINE, &cs->flags) && + !percpu_ref_is_dying(&cs->css.refcnt); } static inline int is_cpu_exclusive(const struct cpuset *cs) -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html