On Mon 30-05-16 20:10:46, Tetsuo Handa wrote: > Michal Hocko wrote: > > > You are trying to make the OOM killer as per mm_struct operation. But > > > I think we need to tolerate the OOM killer as per signal_struct operation. > > > > Signal struct based approach is full of weird behavior which just leads > > to corner cases. I think going mm struct way is the only sensible > > approach. > > I don't think so. What are corner cases the OOM reaper cannot handle with > signal_struct based approach? E.g. all the mm shared outside of the thread group with weird inconsistencies crap. > The OOM-killer decides based on "struct mm_struct" but it is a weakness of > the OOM-killer that it cares only "struct mm_struct". It is possible that > waiting for termination of only one thread releases a lot of memory (e.g. > by closing pipe's file descriptor) and the OOM-killer needs to send SIGKILL > to nobody. How can a thread release pipe's memory when other threads are sharing the same fd? [...] > Given that said, if everybody can agree with making the OOM-killer per > "struct mm_struct" operation, I think reimplementing oom_disable_count which > was removed by commit c9f01245b6a7d77d ("oom: remove oom_disable_count") (i.e. > do not select an OOM victim unless all thread groups using that mm_struct is > killable) seems to be better than ignoring what userspace told to do (i.e. > select an OOM victim even if some thread groups using that mm_struct is not > killable). Userspace knows the risk of setting OOM_SCORE_ADJ_MIN; it is a > strong request like __GFP_NOFAIL allocation. We have global oom_lock which > avoids race condition. Since writing to /proc/pid/oom_score_adj is not frequent, > we can afford mutex_lock_killable(&oom_lock). We can interpret use_mm() request > as setting OOM_SCORE_ADJ_MIN. I am not really sure oom_lock is even needed. It is highly unlikely we would race with an ongoing OOM killer. And even then the lock doesn't bring much better semantic. Regarding oom_disable_count, I think the current approach of http://lkml.kernel.org/r/1464266415-15558-4-git-send-email-mhocko@xxxxxxxxxx has one large advantage. The userspace can simply check the current situation while any internal flag/counter/whatever hides that implementation fact and so the userspace has no means to deal with it. Sure, it can be argued that changing oom_score_adj behind process back is nasty but we already do that for threads and nobody seems to complain. Shared mm between processes is just a different model of threading from the MM point of view. Or is this thinking wrong in principle? -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>