On Mon, 27 Dec 2010 02:21:23 -0500 Ben Blum <bblum@xxxxxxxxxxxxxx> wrote: > > IIUC, what you request is 'don't call any kind of function may sleep'. > > in can_attach() and attach() callback. That's impossible. > > Memory cgroup will go to sleep because of calling memory reclaim. > > > > Is tasklist_lock the only way ? Can you catch "new thread" event in cgroup_fork() ? > > > > For example, > > > > == > > void cgroup_fork(struct task_struct *child) > > { > > if (current->in_moving) { > > add_to_waitqueue somewhere. > > } > > task_lock(current); > > child->cgroups = current->cgroups; > > get_css_set(child->cgroups); > > task_unlock(current); > > INIT_LIST_HEAD(&child->cg_list); > > } > > == > > > > == > > read_lock(&tasklist_lock); > > list_for_each_entry_rcu(tsk, &leader->thread_group, thread_group) { > > tsk->in_moving = true; > > } > > read_unlock(&tasklist_lock); > > > > ->pre_attach() > > ->attach() > > > > read_unlock(&tasklist_lock); > > list_for_each_..... > > tsk->in_moving_cgroup = false; > > > > wakeup threads in waitq. > > == > > > > Ah yes, this will have some bug. But please avoid to take tasklist_lock() around > > all pre_attach() and attach(). > > > > Thanks, > > -Kame > > > > > > > > > > > > Thanks, > > -Kame > > You misunderstand slightly: the callbacks are split into a > once-per-thread function and a thread-independent function, and only the > former is not allowed to sleep. All of memcg's attachment is thread- > independent, so sleeping there is fine. > Okay, then, problem is cpuset, which allocates memory. (But I feel that limitation -never_sleep- is not very good.) BTW, mem cgroup chesk mm_owner(mm) == p to decide to take mmap_sem(). Your code does moving "thread_group_leader()". IIUC, this isn't guaranteed to be the same. I wonder we can see problem with some stupid userland. > Also, the tasklist_lock isn't used to synchronize fork events; that's > what threadgroup_fork_lock (an rwsem) is for. It's for protecting the > thread-group list while iterating over it - so that a race with exec() > (de_thread(), specifically) doesn't change the threadgroup state. It's > basically never safe to iterate over leader->thread_group unless you're > in a no-sleep section. > Thank you for explanation. At first, cpuset should modify settings around 'mm' only when the thread == mm->owner. ....is there any reason to allow all threads can affect the 'mm' ? About nodemask, I feel what's required is pre_pre_attach() to cache required memory before attaching as radix_tree_preload(). Then, subsys can prepare for creating working area. Hmm...but I wonder de_thread() should take threadgroup_fork_write_unlock(). I may not understand anything important but I feel taking tasklist_lock() is overkill. Thanks, -Kame _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers