Re: [PATCH v5 3/3] cgroups: make procs file writable

Ben Blum <bblum@xxxxxxxxxxxxxx> · Fri, 24 Dec 2010 06:45:00 -0500

On Fri, Dec 24, 2010 at 02:49:43AM -0800, David Rientjes wrote:
> On Thu, 23 Dec 2010, Ben Blum wrote:
> 
> > On Thu, Dec 16, 2010 at 12:26:03AM -0800, Andrew Morton wrote:
> > > Patches have gone a bit stale, sorry.  Refactoring in
> > > kernel/cgroup_freezer.c necessitates a refresh and retest please.
> > 
> > commit 53feb29767c29c877f9d47dcfe14211b5b0f7ebd changed a bunch of stuff
> > in kernel/cpuset.c to allocate nodemasks with NODEMASK_ALLOC (which
> > wraps kmalloc) instead of on the stack.
> > 
> 
> It only wraps kmalloc for CONFIG_NODES_SHIFT > 8.
> 
> > 1. All these functions have 'void' return values, indicating that
> >    calling them them must not fail. Sure there are bailout cases, but no
> >    semblance of cross-function error propagation. Most importantly,
> >    cpuset_attach is a subsystem callback, which MUST not fail given the
> >    way it's used in cgroups, so relying on kmalloc is not safe.
> > 
> 
> Yes, that's a valid concern that should be addressed.

depending on the circumstances that would cause such a failure, and the
consequences of "ignoring" it, would doing a cop-out with WARN_ON() be
appropriate?

> > 2. I'm working on a patch series which needs to hold tasklist_lock
> >    across ss->attach callbacks (in cpuset_attach's "if (threadgroup)"
> >    case, this is how safe traversal of tsk->thread_group will be
> >    ensured), and kmalloc isn't allowed while holding a spin-lock. 
> > 
> 
> kmalloc() is allowed while holding a spinlock and NODEMASK_ALLOC() takes a 
> gfp_flags argument for that reason.

Ah, it's only with GFP_KERNEL and friends. So changing the uses in
cpuset_can_attach to GFP_ATOMIC would solve this concern, then?

Then there are no extra issues with my patchset, but note: the use cases
for calling ->attach are going to be hairy enough that it won't be
reasonable to require the caller to back-track if an allocation fails...

> > Why do we need heap-allocation here at all? In each case their scope is
> > exactly the function's scope, and neither the commit nor the surrounding
> > patch series give any explanation. I'd like to revert the patch if
> > possible.
> > 
> 
> Because some kernels, such as those with CONFIG_NODES_SHIFT > 8, cause 
> stack overflows with the large nodemasks.

Ah. I couldn't find any documentation for what the maximum number of
memory controllers could be. this makes sense.

Thanks!

-- Ben
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/containers