On Thu 09-08-18 13:16:25, David Rientjes wrote: > On Mon, 6 Aug 2018, Michal Hocko wrote: > > > > At the risk of continually repeating the same statement, the oom reaper > > > cannot provide the direct feedback for all possible memory freeing. > > > Waking up periodically and finding mm->mmap_sem contended is one problem, > > > but the other problem that I've already shown is the unnecessary oom > > > killing of additional processes while a thread has already reached > > > exit_mmap(). The oom reaper cannot free page tables which is problematic > > > for malloc implementations such as tcmalloc that do not release virtual > > > memory. > > > > But once we know that the exit path is past the point of blocking we can > > have MMF_OOM_SKIP handover from the oom_reaper to the exit path. So the > > oom_reaper doesn't hide the current victim too early and we can safely > > wait for the exit path to reclaim the rest. So there is a feedback > > channel. I would even do not mind to poll for that state few times - > > similar to polling for the mmap_sem. But it would still be some feedback > > rather than a certain amount of time has passed since the last check. > > > > Yes, of course, it would be easy to rely on exit_mmap() to set > MMF_OOM_SKIP itself and have the oom reaper drop the task from its list > when we are assured of forward progress. What polling are you proposing > other than a timeout based mechanism to do this? I was thinking about doing something like the following - oom_reaper checks the amount of victim's memory after it is done with reaping (e.g. by calling oom_badness before and after). If it wasn't able to reclaim much then return false and keep retrying with the existing mechanism - once a flag (e.g. MMF_OOM_MMAP) is set it bails out and won't set the MMF_OOM_SKIP flag. > We could set a MMF_EXIT_MMAP in exit_mmap() to specify that it will > complete free_pgtables() for that mm. The problem is the same: when does > the oom reaper decide to set MMF_OOM_SKIP because MMF_EXIT_MMAP has not > been set in a timely manner? reuse the current retry policy which is the number of attempts rather than any timeout. > If this is an argument that the oom reaper should loop checking for > MMF_EXIT_MMAP and doing schedule_timeout(1) a set number of times rather > than just setting the jiffies in the mm itself, that's just implementing > the same thing and doing so in a way where the oom reaper stalls operating > on a single mm rather than round-robin iterating over mm's in my patch. I've said earlier that I do not mind doing round robin in the oom repaer but this is certainly more complex than what we do now and I haven't seen any actual example where it would matter. OOM reaper is a safely measure. Nothing should fall apart if it is slow. The primary work should be happening from the exit path anyway. -- Michal Hocko SUSE Labs