Re: [PATCH v5 3/3] cgroups: make procs file writable

Ben Blum <bblum@xxxxxxxxxxxxxx> · Mon, 27 Dec 2010 03:42:57 -0500

On Mon, Dec 27, 2010 at 04:42:07PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 27 Dec 2010 02:21:23 -0500
> Ben Blum <bblum@xxxxxxxxxxxxxx> wrote:
>  
> > > IIUC, what you request is 'don't call any kind of function may sleep'.
> > > in can_attach() and attach() callback. That's impossible.
> > > Memory cgroup will go to sleep because of calling memory reclaim.
> > > 
> > > Is tasklist_lock the only way ? Can you catch "new thread" event in cgroup_fork() ?
> > > 
> > > For example,
> > > 
> > > ==
> > > void cgroup_fork(struct task_struct *child)
> > > {
> > > 	if (current->in_moving) {
> > > 		add_to_waitqueue somewhere.
> > > 	}
> > >         task_lock(current);
> > >         child->cgroups = current->cgroups;
> > >         get_css_set(child->cgroups);
> > >         task_unlock(current);
> > >         INIT_LIST_HEAD(&child->cg_list);
> > > }
> > > ==
> > > 
> > > ==
> > >  read_lock(&tasklist_lock);
> > >  list_for_each_entry_rcu(tsk, &leader->thread_group, thread_group) {
> > > 	tsk->in_moving = true;
> > >  }
> > >  read_unlock(&tasklist_lock);
> > > 
> > >  ->pre_attach()
> > >  ->attach()
> > > 
> > >  read_unlock(&tasklist_lock);
> > >  list_for_each_.....
> > > 	tsk->in_moving_cgroup = false;
> > > 
> > >  wakeup threads in waitq.
> > > ==
> > > 
> > > Ah yes, this will have some bug. But please avoid to take tasklist_lock() around
> > > all pre_attach() and attach().
> > > 
> > > Thanks,
> > > -Kame
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Thanks,
> > > -Kame
> > 
> > You misunderstand slightly: the callbacks are split into a
> > once-per-thread function and a thread-independent function, and only the
> > former is not allowed to sleep. All of memcg's attachment is thread-
> > independent, so sleeping there is fine.
> > 
> Okay, then, problem is cpuset, which allocates memory.
> (But I feel that limitation -never_sleep- is not very good.)
> 
> BTW, mem cgroup chesk mm_owner(mm) == p to decide to take mmap_sem().
> Your code does moving "thread_group_leader()". IIUC, this isn't guaranteed to
> be the same. I wonder we can see problem with some stupid userland.

I'm not sure I understand the point of looking at mm->owner? My code
does thread_group_leader() because if a task that's not the leader calls
exec(), the thread_group list will be in an indeterminate state from the
perspective of the task we hold on to (who was originally the leader).

> > Also, the tasklist_lock isn't used to synchronize fork events; that's
> > what threadgroup_fork_lock (an rwsem) is for. It's for protecting the
> > thread-group list while iterating over it - so that a race with exec()
> > (de_thread(), specifically) doesn't change the threadgroup state. It's
> > basically never safe to iterate over leader->thread_group unless you're
> > in a no-sleep section.
> > 
> 
> Thank you for explanation.
> 
> At first, cpuset should modify settings around 'mm' only when the thread == mm->owner.
> ....is there any reason to allow all threads can affect the 'mm' ?

I don't follow... is this any different from the cgroup_attach_task()
case?

> About nodemask, I feel what's required is pre_pre_attach() to cache required memory
> before attaching as radix_tree_preload(). Then, subsys can prepare for creating
> working area.

To save on global memory footprint, we could pre-allocate two nodemasks,
but I feel like it's not worth the increase in code complexity. This
would need to be done in the other cases that unsafely do NODEMASK_ALLOC
too... too much to keep track of for little gain.

> Hmm...but I wonder de_thread() should take threadgroup_fork_write_unlock().
> 
> I may not understand anything important but I feel taking tasklist_lock() is overkill.

Would rcu_read_lock() be any better? Everywhere else in the kernel that
iterates over all threads in a group uses either rcu_read_lock or
tasklist_lock.

-- Ben

> 
> Thanks, 
> -Kame
> 
> 
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/containers