On Sun, 15 Aug 2010, Oleg Nesterov wrote: > Again, I do not know how the code looks without the patch, but > Why not? This series is based on Linus' tree. > > static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem) > > { > > + struct task_struct *g, *q; > > + struct mm_struct *mm; > > + > > p = find_lock_task_mm(p); > > if (!p) { > > task_unlock(p); > > return 1; > > } > > + > > + /* mm cannot be safely dereferenced after task_unlock(p) */ > > Yes. But also we can't trust this pointer, see below. > > > + mm = p->mm; > > + > > pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n", > > task_pid_nr(p), p->comm, K(p->mm->total_vm), > > K(get_mm_counter(p->mm, MM_ANONPAGES)), > > K(get_mm_counter(p->mm, MM_FILEPAGES))); > > task_unlock(p); > > > > - > > set_tsk_thread_flag(p, TIF_MEMDIE); > > force_sig(SIGKILL, p); > > So, we killed this process. It is very possible it was the only user > of this ->mm. exit_mm() can free this mmemory. After that another task > execs, exec_mmap() can allocate the same memory again. > Right, this was a race in the original code as well before it was removed in 8c5cd6f3 and existed for years. > > @@ -438,6 +444,20 @@ static int oom_kill_task(struct task_struct *p, struct mem_cgroup *mem) > > */ > > boost_dying_task_prio(p, mem); > > > > + /* > > + * Kill all threads sharing p->mm in other thread groups, if any. They > > + * don't get access to memory reserves or a higher scheduler priority, > > + * though, to avoid depletion of all memory or task starvation. This > > + * prevents mm->mmap_sem livelock when an oom killed task cannot exit > > + * because it requires the semaphore and its contended by another > > + * thread trying to allocate memory itself. That thread will now get > > + * access to memory reserves since it has a pending fatal signal. > > + */ > > + do_each_thread(g, q) { > > + if (q->mm == mm && !same_thread_group(q, p)) > > + force_sig(SIGKILL, q); > > + } while_each_thread(g, q); > > We can kill the wrong task. "q->mm == mm" doesn't necessarily mean > we found the task which shares ->mm with p (see above). > > This needs atomic_inc(mm_users). And please do not use do_each_thread. > Instead of using mm_users to pin the mm, we could simply do this iteration with for_each_process() before sending the SIGKILL to p. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>