Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.

Michal Hocko <mhocko@xxxxxxxxxx> · Wed, 22 Aug 2018 10:03:42 +0200

On Tue 21-08-18 10:20:00, David Rientjes wrote:
> On Tue, 21 Aug 2018, Michal Hocko wrote:
> 
> > > Ok, so it appears you're suggesting a per-mm counter of oom reaper retries 
> > > and once it reaches a certain threshold, either give up and set 
> > > MMF_OOM_SKIP or declare that exit_mmap() is responsible for it.  That's 
> > > fine, but obviously I'll be suggesting that the threshold is rather large.  
> > > So if I adjust my patch to be a retry counter rather than timestamp, do 
> > > you have any other reservations?
> > 
> > It absolutely has to be an internal thing without any user API to be
> > set. Also I still haven't heard any specific argument why would oom
> > reaper need to do per-task attempt and loop over all victims on the
> > list. Maybe you have some examples though.
> > 
> 
> It would be per-mm in this case, the task itself is no longer important 
> other than printing to the kernel log.  I think we could simply print that 
> the oom reaper has reaped mm->owner.
> 
> The oom reaper would need to loop over the per-mm list because the retry 
> counter is going to have a high threshold so that processes have the 
> ability to free their memory before the oom reaper declares it can no 
> longer make forward progress.

What do you actually mean by a high threshold?

> We cannot stall trying to reap a single mm 
> with a high retry threshold from a memcg hierarchy when another memcg 
> hierarchy is also oom.  The ability for one victim to make forward 
> progress can depend on a lock held by another oom memcg hierarchy where 
> reaping would allow it to be dropped.

Could you be more specific please?

-- 
Michal Hocko
SUSE Labs