On Thu, Dec 23, 2010 at 10:33:52PM -0500, Ben Blum wrote: > On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote: > > Patches have gone a bit stale, sorry. Refactoring in > > kernel/cgroup_freezer.c necessitates a refresh and retest please. > > commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of stuff > in kernel/cpuset.c to allocate nodemasks with NODEMASK_ALLOC (which > wraps kmalloc) instead of on the stack. > > 1. All these functions have 'void' return values, indicating that > calling them them must not fail. Sure there are bailout cases, but no > semblance of cross-function error propagation. Most importantly, > cpuset_attach is a subsystem callback, which MUST not fail given the > way it's used in cgroups, so relying on kmalloc is not safe. > > 2. I'm working on a patch series which needs to hold tasklist_lock > across ss->attach callbacks (in cpuset_attach's "if (threadgroup)" > case, this is how safe traversal of tsk->thread_group will be > ensured), and kmalloc isn't allowed while holding a spin-lock. > > Why do we need heap-allocation here at all? In each case their scope is > exactly the function's scope, and neither the commit nor the surrounding > patch series give any explanation. I'd like to revert the patch if > possible. > > cc'ing Miao Xie (author) and David Rientjes (acker). > > -- Ben > By the way, there is still another issue on top of this. cpuset_attach -> do_migrate_pages -> migrate_prep -> lru_add_drain_all -> __alloc_percpu, which does GFP_KERNEL, which can sleep. This will be no good if we call ss->attach with tasklist_lock held. A friend points out that even without the sleeping/etc you still don't want to be doing a memory migration while holding a spinlock. I think the best solution for this would be to add another subsystem callback, "attach_thread", which would do the cheap once-per-thread operations and be called under tasklist_lock. It'd look like this: read_lock(tasklist_lock) for each thread (c) { cgroup_task_migrate(c) for each subsys (ss) { ss->attach_thread(c) } } read_unlock(tasklist_lock) for each subsys (ss) { ss->attach(leader) } For cpuset, this will need to keep the nodemask data between the attach_thread and attach functions, so the nodemasks will need to be global instead of local. -- Ben P.S. when i'm iterating over tsk->thread_group while using tasklist_lock to protect it instead of rcu_read_lock, should i use list_for_each_entry or list_for_each_entry_rcu? it has been a while since i wrote that bit. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers