On Wed 04-07-18 09:16:32, Michal Hocko wrote: > On Wed 04-07-18 11:22:55, Tetsuo Handa wrote: > > Michal Hocko wrote: > > > > On Tue 03-07-18 23:25:01, Tetsuo Handa wrote: > > > > > This series provides > > > > > > > > > > (1) Mitigation and a fix for CVE-2016-10723. > > > > > > > > > > (2) A mitigation for needlessly selecting next OOM victim reported > > > > > by David Rientjes and rejected by Michal Hocko. > > > > > > > > > > (3) A preparation for handling many concurrent OOM victims which > > > > > could become real by introducing memcg-aware OOM killer. > > > > > > > > It would have been great to describe the overal design in the cover > > > > letter. So let me summarize just to be sure I understand the proposal. > > > > You understood the proposal correctly. > > > > > > You are removing the oom_reaper and moving the oom victim tear down to > > > > the oom path. > > > > Yes. This is for getting rid of the lie > > > > /* > > * Acquire the oom lock. If that fails, somebody else is > > * making progress for us. > > */ > > if (!mutex_trylock(&oom_lock)) { > > *did_some_progress = 1; > > schedule_timeout_uninterruptible(1); > > return NULL; > > } > > > > which is leading to CVE-2016-10723. By reclaiming from the OOM killer path, > > we can eliminate this heuristic. > > > > Of course, we don't have to remove the OOM reaper kernel thread. > > The thing is that the current design uses the oom_reaper only as a > backup to get situation unstuck. Once you move all that heavy lifting > into the oom path directly then you will have to handle all sorts of > issues. E.g. how do you handle that a random process hitting OOM path > has to pay the full price to tear down multi TB process? This is a lot > of time. And one more thing. Your current design doesn't solve any of the current shortcomings. mlocked pages are still not reclaimable from the direct oom tear down. Blockable mmu notifiers still prevent the direct tear down. So the only thing that you achieve with a large and disruptive patch is that the exit vs. oom locking protocol got simplified and that you can handle oom domains from tasks belonging to them. This is not bad but it has its own downsides which either fail to see or reluctant to describe and explain. -- Michal Hocko SUSE Labs