Oleg Nesterov <oleg@xxxxxxxxxx> writes: > On 08/20, Oleg Nesterov wrote: >> >> On 08/20, Eric W. Biederman wrote: >> > >> > --- a/fs/exec.c >> > +++ b/fs/exec.c >> > @@ -1139,6 +1139,10 @@ static int exec_mmap(struct mm_struct *mm) >> > vmacache_flush(tsk); >> > task_unlock(tsk); >> > if (old_mm) { >> > + mm->oom_score_adj = old_mm->oom_score_adj; >> > + mm->oom_score_adj_min = old_mm->oom_score_adj_min; >> > + if (tsk->vfork_done) >> > + mm->oom_score_adj = tsk->vfork_oom_score_adj; >> >> too late, ->vfork_done is NULL after mm_release(). >> >> And this can race with __set_oom_adj(). Yes, the current code is racy too, >> but this change adds another race, __set_oom_adj() could already observe >> ->mm != NULL and update mm->oom_score_adj. > ^^^^^^^^^^^^ > > I meant ->mm == new_mm. > > And another problem. Suppose we have > > if (!vfork()) { > change_oom_score(); > exec(); > } > > the parent can be killed before the child execs, in this case vfork_oom_score_adj > will be lost. Yes. Looking at include/uapi/linux/oom.h it appears that there are a lot of oom_score_adj values that are reserved. So it should be completely possible to initialize vfork_oom_score_adj to -32768 aka SHRT_MIN, and use that as a flag to see if it is active or not. Likewise for vfork_oom_score_adj_min if we need to duplicate that one as well. That deals with that entire class of race. We still have races during exec about vfork_done being cleared before the new ->mm == new_mm. While that is worth fixing is an independent issue. Eric