Re: [PATCH 4/4] mm, oom: Fix unnecessary killing of additional processes.

David Rientjes <rientjes@xxxxxxxxxx> · Mon, 20 Aug 2018 14:31:04 -0700 (PDT)

On Mon, 20 Aug 2018, Michal Hocko wrote:

> > The oom reaper will always be unable to free some memory, such as page 
> > tables.  If it can't grab mm->mmap_sem in a reasonable amount of time, it 
> > also can give up early.  The munlock() case is another example.  We 
> > experience unnecessary oom killing during free_pgtables() where the 
> > single-threaded exit_mmap() is freeing an enormous amount of page tables 
> > (usually a malloc implementation such as tcmalloc that does not free 
> > virtual memory) and other processes are faulting faster than we can free.  
> > It's a combination of a multiprocessor system and a lot of virtual memory 
> > from the original victim.  This is the same case as being unable to 
> > munlock quickly enough in exit_mmap() to free the memory.
> > 
> > We must wait until free_pgtables() completes in exit_mmap() before killing 
> > additional processes in the large majority (99.96% of cases from my data) 
> > of instances where oom livelock does not occur.  In the remainder of 
> > situations, livelock has been prevented by what the oom reaper has been 
> > able to free.  We can, of course, not do free_pgtables() from the oom 
> > reaper.  So my approach was to allow for a reasonable amount of time for 
> > the victim to free a lot of memory before declaring that additional 
> > processes must be oom killed.  It would be functionally similar to having 
> > the oom reaper retry many, many more times than 10 and having a linked 
> > list of mm_structs to reap.  I don't care one way or another if it's a 
> > timeout based solution or many, many retries that have schedule_timeout() 
> > that yields the same time period in the end.
> 
> I would really keep the current retry logic with an extension to allow
> to keep retrying or hand over to exit_mmap when we know it is past the
> last moment of blocking.
> 

Ok, so it appears you're suggesting a per-mm counter of oom reaper retries 
and once it reaches a certain threshold, either give up and set 
MMF_OOM_SKIP or declare that exit_mmap() is responsible for it.  That's 
fine, but obviously I'll be suggesting that the threshold is rather large.  
So if I adjust my patch to be a retry counter rather than timestamp, do 
you have any other reservations?