On Thu, 2012-07-19 at 09:50 -0700, Tejun Heo wrote: > On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote: > > From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > > > > We dropped cgroup mutex, because of a deadlock between memcg and cpuset. > > cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy > > did lru_add_drain_all() which took hotplug lock while already holding > > cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405 > > But dropping cgroup_mutex in cgroup_rmdir also means tasks could get > > added to cgroup while we are in pre_destroy. This makes error handling in > > pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback. > > Core cgroup will now call pre_destroy with cgroup_mutex held. > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > > I generally think cgroup_mutex shouldn't be dropped across any cgroup > hierarchy changing operation and thus agree with the cgroup core > change. > > > static int mem_cgroup_pre_destroy(struct cgroup *cont) > > { > > + int ret; > > struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > > > > - return mem_cgroup_force_empty(memcg, false); > > + cgroup_unlock(); > > + /* > > + * we call lru_add_drain_all, which end up taking > > + * mutex_lock(&cpu_hotplug.lock), But cpuset have > > + * the reverse order. So drop the cgroup lock > > + */ > > + ret = mem_cgroup_force_empty(memcg, false); > > + cgroup_lock(); > > + return ret; > > } > > This reverse dependency from cpuset is the same problem Glauber > reported a while ago. I don't know why / how cgroup_mutex got > exported to outside world but this is asking for trouble. cgroup > mutex protects cgroup hierarchies. There are many core subsystems > which implement cgroup controllers. Controller callbacks for > hierarchy changing oeprations need to synchronize with the rest of the > core subsystems. So, by design, in locking hierarchy, cgroup_mutex > has to be one of the outermost locks. If somebody tries to grab it > from inside other core subsystem locks, there of course will be > circular locking dependencies. > > So, Peter, why does cpuset mangle with cgroup_mutex? What guarantees > does it need? Why can't it work on "changed" notification while > caching the current css like blkcg does? I've no clue sorry.. /me goes stare at this stuff.. Looks like something Paul Menage did when he created cgroups. I'll have to have a hard look at all that to untangle this. Not something obvious to me. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href