On Thu 14-04-16 20:34:18, Tetsuo Handa wrote: > Michal Hocko wrote: [...] > > The patch seems correct I just do not see any point in it because I do > > not think it handles any real life situation. I basically consider any > > workload where only _certain_ thread(s) or process(es) sharing the mm have > > OOM_SCORE_ADJ_MIN set as invalid. Why should we care about those? This > > requires root to cripple the system. Or am I missing a valid > > configuration where this would make any sense? > > Because __oom_reap_task() as of current linux.git marks only one of > thread groups as OOM_SCORE_ADJ_MIN and happily disables further reaping > (which I'm utilizing such behavior for catching bugs which occur under > almost OOM situation). I am not really sure I understand what you mean here. Let me try. You have N tasks sharing the same mm. OOM killer selects one of them and kills it, grants TIF_MEMDIE and schedules it for oom_reaper. Now the oom reaper handles that task and marks it OOM_SCORE_ADJ_MIN. Others will have fatal_signal_pending without OOM_SCORE_ADJ_MIN. The shared mm was already reaped so there is not much left we can do about it. What now? A different question is whether it makes any sense to pick a task with oom reaped mm as a new victim. This would happen if either the memory is not reapable much or the mm was quite small. I agree that we do not handle this case now same as we haven't before. An mm specific flag would handle that I believe. Something like the following. Is this what you are worried about or am I still missing your point? --- diff --git a/include/linux/sched.h b/include/linux/sched.h index acfc32b30704..7bd0fa9db199 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -512,6 +512,7 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_HAS_UPROBES 19 /* has uprobes */ #define MMF_RECALC_UPROBES 20 /* MMF_HAS_UPROBES can be wrong */ +#define MMF_OOM_REAPED 21 /* mm has been already reaped */ #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 716759e3eaab..d5a4d08f2031 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -286,6 +286,13 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc, return OOM_SCAN_CONTINUE; /* + * mm of this task has already been reaped so it doesn't make any + * sense to select it as a new oom victim. + */ + if (test_bit(MMF_OOM_REAPED, &task->mm->flags)) + return OOM_SCAN_CONTINUE; + + /* * If task is allocating a lot of memory and has been marked to be * killed first if it triggers an oom, then select it. */ @@ -513,7 +520,7 @@ static bool __oom_reap_task(struct task_struct *tsk) * This task can be safely ignored because we cannot do much more * to release its memory. */ - tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN; + test_bit(MMF_OOM_REAPED, &mm->flags); out: mmput(mm); return ret; -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>