On Thu, Dec 23, 2010 at 10:33:52PM -0500, Ben Blum wrote: > On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote: > > Patches have gone a bit stale, sorry. Refactoring in > > kernel/cgroup_freezer.c necessitates a refresh and retest please. > > commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of stuff > in kernel/cpuset.c to allocate nodemasks with NODEMASK_ALLOC (which > wraps kmalloc) instead of on the stack. > > 1. All these functions have 'void' return values, indicating that > calling them them must not fail. Sure there are bailout cases, but no > semblance of cross-function error propagation. Most importantly, > cpuset_attach is a subsystem callback, which MUST not fail given the > way it's used in cgroups, so relying on kmalloc is not safe. > > 2. I'm working on a patch series which needs to hold tasklist_lock > across ss->attach callbacks (in cpuset_attach's "if (threadgroup)" > case, this is how safe traversal of tsk->thread_group will be > ensured), and kmalloc isn't allowed while holding a spin-lock. > > Why do we need heap-allocation here at all? In each case their scope is > exactly the function's scope, and neither the commit nor the surrounding > patch series give any explanation. I'd like to revert the patch if > possible. > > cc'ing Miao Xie (author) and David Rientjes (acker). > > -- Ben Well even with the proposed solution to this there is still another problem that I see - that of mmap_sem. cpuset_attach() calls into mpol_rebind_mm() and do_migrate_pages(), which take mm->mmap_sem for writing and reading respectively. This is going to conflict with tasklist_lock... but moreover, the memcontrol subsys also touches the task's mm->mmap_sem, holding onto it between mem_cgroup_can_attach() and mem_cgroup_move_task() - as of b1dd693e5b9348bd68a80e679e03cf9c0973b01b. So we have (currently, even without my patches): cgroup_attach_task (1) cpuset_can_attach (2) mem_cgroup_can_attach - down_read(&mm->mmap_sem); (3) cpuset_attach - mpol_rebind_mm - down_write(&mm->mmap_sem); - up_write(&mm->mmap_sem); - cpuset_migrate_mm - do_migrate_pages - down_read(&mm->mmap_sem); - up_read(&mm->mmap_sem); (4) mem_cgroup_move_task - mem_cgroup_clear_mc - up_read(...); Is there some interdependency I'm missing here that guarantees recursive locking/deadlock will be avoided? It all looks like typical-case code. I think we should move taking the mmap_sem all the way up into cgroup_attach_task and cgroup_attach_proc; it will be held for writing the whole time. I don't quite understand the mempolicy stuff but maybe there can be ways to use mpol_rebind_mm and do_migrate_pages when the lock is already held. Adding Daisuke Nishimura and Kamezawa Hiroyuki from the commit mentioned above. -- Ben _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers