Re: [PATCH v5 3/3] cgroups: make procs file writable

Ben Blum <bblum@xxxxxxxxxxxxxx> · Fri, 24 Dec 2010 23:24:33 -0500

On Thu, Dec 23, 2010 at 10:33:52PM -0500, Ben Blum wrote:
> On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote:
> > Patches have gone a bit stale, sorry.  Refactoring in
> > kernel/cgroup_freezer.c necessitates a refresh and retest please.
> 
> commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of stuff
> in kernel/cpuset.c to allocate nodemasks with NODEMASK_ALLOC (which
> wraps kmalloc) instead of on the stack.
> 
> 1. All these functions have 'void' return values, indicating that
>    calling them them must not fail. Sure there are bailout cases, but no
>    semblance of cross-function error propagation. Most importantly,
>    cpuset_attach is a subsystem callback, which MUST not fail given the
>    way it's used in cgroups, so relying on kmalloc is not safe.
> 
> 2. I'm working on a patch series which needs to hold tasklist_lock
>    across ss->attach callbacks (in cpuset_attach's "if (threadgroup)"
>    case, this is how safe traversal of tsk->thread_group will be
>    ensured), and kmalloc isn't allowed while holding a spin-lock. 
> 
> Why do we need heap-allocation here at all? In each case their scope is
> exactly the function's scope, and neither the commit nor the surrounding
> patch series give any explanation. I'd like to revert the patch if
> possible.
> 
> cc'ing Miao Xie (author) and David Rientjes (acker).
> 
> -- Ben
> 

By the way, there is still another issue on top of this.

cpuset_attach
-> do_migrate_pages
-> migrate_prep
-> lru_add_drain_all
-> __alloc_percpu, which does GFP_KERNEL, which can sleep.

This will be no good if we call ss->attach with tasklist_lock held. A
friend points out that even without the sleeping/etc you still don't
want to be doing a memory migration while holding a spinlock.

I think the best solution for this would be to add another subsystem
callback, "attach_thread", which would do the cheap once-per-thread
operations and be called under tasklist_lock. It'd look like this:

read_lock(tasklist_lock)
for each thread (c) {
    cgroup_task_migrate(c)
    for each subsys (ss) {
        ss->attach_thread(c)
    }
}
read_unlock(tasklist_lock)
for each subsys (ss) {
    ss->attach(leader)
}

For cpuset, this will need to keep the nodemask data between the
attach_thread and attach functions, so the nodemasks will need to be
global instead of local.

-- Ben

P.S. when i'm iterating over tsk->thread_group while using tasklist_lock
to protect it instead of rcu_read_lock, should i use list_for_each_entry
or list_for_each_entry_rcu? it has been a while since i wrote that bit.
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/containers