On Tue, 5 Nov 2013, Sameer Nanda wrote: > The selection of the process to be killed happens in two spots -- first > in select_bad_process and then a further refinement by looking for > child processes in oom_kill_process. Since this is a two step process, > it is possible that the process selected by select_bad_process may get a > SIGKILL just before oom_kill_process executes. If this were to happen, > __unhash_process deletes this process from the thread_group list. This > then results in oom_kill_process getting stuck in an infinite loop when > traversing the thread_group list of the selected process. > > Fix this race by holding the tasklist_lock across the calls to both > select_bad_process and oom_kill_process. > > Change-Id: I8f96b106b3257b5c103d6497bac7f04f4dff4e60 > Signed-off-by: Sameer Nanda <snanda@xxxxxxxxxxxx> Nack, we had to avoid taking tasklist_lock for this duration since it stalls out forks and exits on other cpus trying to take the writeside with irqs disabled to avoid watchdog problems. What kernel version are you patching? If you check the latest Linus tree, we hold a reference to the task_struct of the chosen process before calling oom_kill_process() so the hypothesis would seem incorrect. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>