On Wed, Jun 2, 2010 at 1:58 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: >> The "it" that you're proposing to remove is in fact the code that >> handles those races. > > In that case I confused, and I thought we already agreed that > the PF_EXITING check in attach_task_by_pid() is not strictly needed > for correctness. Not quite - something is required for correctness, and the PF_EXITING check provides that correctness, with a very small window (between setting PF_EXITING and calling cgroup_exit) where we might arguably have been able to move the thread but decline to do so because it's simpler not to do so and no-one cares. That's the optimization that I meant - the data structures are slightly simpler since there's no way to tell when a task has passed cgroup_exit(), and instead we just see if they've passed PF_EXITING. > > Once again, the task can call do_exit() and set PF_EXITING right > after the check. Yes, the important part is that they haven't set it *before* the check in attach_task_by_pid(). If they have set it before that, then they could be anywhere in the exit path after PF_EXITING, and we decline to move them since it's possible that they've already passed cgroup_exit(). If the exiting task has not yet set PF_EXITING, then it can't possibly get into the critical section in cgroup_exit() since attach_task_by_pid() holds task->alloc_lock. Paul _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers