On Tue 09-10-18 22:51:00, Tetsuo Handa wrote: > On 2018/10/09 22:26, Michal Hocko wrote: > > On Tue 09-10-18 22:14:24, Tetsuo Handa wrote: > >> On 2018/10/09 21:58, Michal Hocko wrote: > >>> On Tue 09-10-18 21:52:12, Tetsuo Handa wrote: > >>>> On 2018/10/09 20:10, Michal Hocko wrote: > >>>>> On Tue 09-10-18 19:00:44, Tetsuo Handa wrote: > >>>>>>> 2) add OOM_SCORE_ADJ_MIN and do not kill tasks sharing mm and do not > >>>>>>> reap the mm in the rare case of the race. > >>>>>> > >>>>>> That is no problem. The mistake we made in 4.6 was that we updated oom_score_adj > >>>>>> to -1000 (and allowed unprivileged users to OOM-lockup the system). > >>>>> > >>>>> I do not follow. > >>>>> > >>>> > >>>> http://tomoyo.osdn.jp/cgi-bin/lxr/source/mm/oom_kill.c?v=linux-4.6.7#L493 > >>> > >>> Ahh, so you are not referring to the current upstream code. Do you see > >>> any specific problem with the current one (well, except for the possible > >>> race which I have tried to evaluate). > >>> > >> > >> Yes. "task_will_free_mem(current) in out_of_memory() returns false due to MMF_OOM_SKIP > >> being already set" is a problem for clone(CLONE_VM without CLONE_THREAD/CLONE_SIGHAND) > >> with the current code. > > > > a) I fail to see how that is related to your previous post and b) could > > you be more specific. Is there any other scenario from the two described > > in my earlier email? > > > > I do not follow. Just reverting commit 44a70adec910d692 and commit 97fd49c2355ffded > is sufficient for closing the copy_process() versus __set_oom_adj() race. Please go back and see why this has been done in the first place. > We went too far towards complete "struct mm_struct" based OOM handling. But stepping > back to "struct signal_struct" based OOM handling solves Yong-Taek's for_each_process() > latency problem and your copy_process() versus __set_oom_adj() race problem and my > task_will_free_mem(current) race problem. And again, I have put an evaluation of the race and try to see what is the effect. Then you have started to fire hard to follow notes and it is not clear whether the analysis/conclusions is wrong/incomplete. So an we get back to that analysis and stick to the topic please? -- Michal Hocko SUSE Labs