On 03/08, Siddhesh Poyarekar wrote: > > On Wed, Mar 7, 2012 at 9:08 PM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > rcu_read_lock() can not help without the additional checks. By the > > time you take it, task->thread_group->next can point to nowhere. > > I thought I understood this the second time, but I think I haven't. > > > Once again. You have the task_struct *task. It exits, > > but task->thread_group->next still points to another thread T. Now suppose > > that T exits too. But task->thread_group->next was not changed, it still > > points to T. RCU grace period passes, T is freed. > > This is the point I haven't understood. From what I understand about > rcu, the rcu update will first update task->thread_group->next Not in this case. see __unhash_process(p)->list_del_rcu(p->thread_group). You missed the fact that ->thread_group differs from the "usual" rcu protected list. The _head_ of the list can be list_del_rcu'd. Not the first/last/any entry, even the head. Or IOW, we do not really have the head. Every task is the list entry, but it also can be be used as a head by while_each_thread(). > and > then reclaim the struct it pointed to and not the other way around. So > with: > > >> rcu_read_lock(); > >> - while_each_thread(task, t) { > >> + t = list_first_entry_rcu(&task->thread_group, > >> + struct task_struct, thread_group); > > since I have the rcu_read_lock when I'm touching the rcu protected > list, It is not rcu-protected if this task has already exited, that is why you need (say) pid_alive() check. > I guess there is a corner case where the current task is released and > thread_group is rcu_list_del()'d. Yes, assuming that "current" means this "task", > In that case too, before this > happens, the proc entry is removed I guess you meant proc_flush_task()... Not sure I really understand, it can't "remove" the opened entry. This is just optimization which tries to shrink the cache. But this doesn't matter, it can exit right after get_pid_task() succeeds. (OK, and after mm_for_maps() in this particular case, otherwise m_start() fails). > and the task namespace is unmounted > from /proc. Again, this doesn't matter, but note the nr == 1 check. This is only called when init exits and this simply does kern_unmount(). > Also, the thread_group being deleted from list is merely > an update of references and we should get the next element Yes, yes, yes, but this "next element" can exit too before you take rcu_read_lock, and in this case the deleted entry won't be updated. That is the problem. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-next" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html