On 11/03, David Rientjes wrote: > > On Wed, 3 Nov 2010, Oleg Nesterov wrote: > > > IOW. I believe that 3d5992d2ac7dc09aed8ab537cba074589f0f0a52 > > "oom: add per-mm oom disable count" should be reverted or fixed. > > > > Trivial example. A process with 2 threads, T1 and T2. > > ->mm->oom_disable_count = 0. > > > > oom_score_adj_write() sets OOM_SCORE_ADJ_MIN and increments > > oom_disable_count. > > > > T2 exits, notices OOM_SCORE_ADJ_MIN and decrements ->oom_disable_count > > back to zero. > > > > Now, T1 runs with OOM_SCORE_ADJ_MIN, but its ->oom_disable_count == 0. > > > > No? > > > > The intent of Ying's patch was for mm->oom_disable_count to map the number > of threads sharing the ->mm that have p->signal->oom_score_adj == > OOM_SCORE_ADJ_MIN. Yes, I see the intent. But the patch is obviouly wrong. > > Another reason to move ->oom_score_adj into ->mm ;) > > > > I would _love_ to move oom_score_adj into struct mm_struct, and I fought > very strongly to do so, Yes, I know ;) > > Not sure this needs additional locking. exec_mmap() is called when > > there are no other threads, we can rely on task_lock() we hold. > > > > There are no other threads that can share tsk->signal at this point? I > was mislead by the de_thread() comment about CLONE_SIGHAND. Agreed, the comment is misleading. "Other processes might share the signal table" actually means: other processes (not only sub-threads) can share ->sighand. That is why de_thread() checks oldsighand->count at the end of this function, after we already killed all sub-threads. But at this point nobody except current uses this ->signal. > > > static int copy_mm(unsigned long clone_flags, struct task_struct * tsk) > > > { > > > struct mm_struct * mm, *oldmm; > > > + unsigned long flags; > > > int retval; > > > > > > tsk->min_flt = tsk->maj_flt = 0; > > > @@ -743,8 +744,11 @@ good_mm: > > > /* Initializing for Swap token stuff */ > > > mm->token_priority = 0; > > > mm->last_interval = 0; > > > - if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > > > - atomic_inc(&mm->oom_disable_count); > > > + if (lock_task_sighand(tsk, &flags)) { > > > + if (tsk->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) > > > + atomic_inc(&mm->oom_disable_count); > > > + unlock_task_sighand(tsk, &flags); > > > + } > > > > This doesn't need ->siglock too. Nobody can see this new child, > > nobody can access its tsk->signal. > > Ok! OOPS! Sorry, I didn't notice that this code works in CLONE_VM|CLONE_THREAD case too. In this case we do need the locking. Wait. And what about the case I meant, !CLONE_THREAD case? In this case we don't need ->siglock, but atomic_inc() is very wrong. Note that this (new) mm_struct has the "random" value in ->oom_disable_count copied from parent's ->mm. > > > @@ -1700,13 +1707,19 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags) > > > } > > > > > > if (new_mm) { > > > + unsigned long flags; > > > + > > > mm = current->mm; > > > active_mm = current->active_mm; > > > current->mm = new_mm; > > > current->active_mm = new_mm; > > > - if (current->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) { > > > - atomic_dec(&mm->oom_disable_count); > > > - atomic_inc(&new_mm->oom_disable_count); > > > + if (lock_task_sighand(current, &flags)) { > > > + if (current->signal->oom_score_adj == > > > + OOM_SCORE_ADJ_MIN) { > > > + atomic_dec(&mm->oom_disable_count); > > > + atomic_inc(&new_mm->oom_disable_count); > > > + } > > > > This is racy anyway, even if we take ->siglock. > > > > If we need the protection from oom_score_adj_write(), then we have > > to change ->mm under ->siglock as well. Otherwise, suppose that > > oom_score_adj_write() sets OOM_SCORE_ADJ_MIN right after unshare() > > does current->mm = new_mm. > > > > We're protected by task_lock(current) in unshare, it can't do > current->mm = new_mm while task_lock() is held in oom_score_adj_write(). Indeed, I was wrong, thanks. I forgot that this code actually never works (if it worked, it should change ->mm for all sub-threads, each has its own task->alloc_lock). > > However. Please do not touch this code. It doesn't work anyway, > > I'll resend the patch which removes this crap. > > > > Ok, I'll look forward to that :) Sorry, don't have the time today. Will do tomorrow. > Do you see issues with the mapping of threads attached to an mm being > counted appropriately in mm->oom_disable_count? Not sure I understand you. The main problem is, they are not counted correctly. If exit_mm() decrements this counter then oom_score_adj_write() should account every live (with ->mm != NULL) thread, this is nasty. Or we should find the way to drop the counter only when the whole process exits (and in this case CLONE_THREAD shouldn't touch the counter). Oleg. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>