Re: [PATCH 1/3] mm,oom: Move last second allocation to inside the OOM killer.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 01-12-17 15:57:11, Johannes Weiner wrote:
> On Fri, Dec 01, 2017 at 04:17:15PM +0100, Michal Hocko wrote:
> > On Fri 01-12-17 14:56:38, Johannes Weiner wrote:
> > > On Fri, Dec 01, 2017 at 03:46:34PM +0100, Michal Hocko wrote:
> > > > On Fri 01-12-17 14:33:17, Johannes Weiner wrote:
> > > > > On Sat, Nov 25, 2017 at 07:52:47PM +0900, Tetsuo Handa wrote:
> > > > > > @@ -1068,6 +1071,17 @@ bool out_of_memory(struct oom_control *oc)
> > > > > >  	}
> > > > > >  
> > > > > >  	select_bad_process(oc);
> > > > > > +	/*
> > > > > > +	 * Try really last second allocation attempt after we selected an OOM
> > > > > > +	 * victim, for somebody might have managed to free memory while we were
> > > > > > +	 * selecting an OOM victim which can take quite some time.
> > > > > 
> > > > > Somebody might free some memory right after this attempt fails. OOM
> > > > > can always be a temporary state that resolves on its own.
> > > > > 
> > > > > What keeps us from declaring OOM prematurely is the fact that we
> > > > > already scanned the entire LRU list without success, not last second
> > > > > or last-last second, or REALLY last-last-last-second allocations.
> > > > 
> > > > You are right that this is inherently racy. The point here is, however,
> > > > that the race window between the last check and the kill can be _huge_!
> > > 
> > > My point is that it's irrelevant. We already sampled the entire LRU
> > > list; compared to that, the delay before the kill is immaterial.
> > 
> > Well, I would disagree. I have seen OOM reports with a free memory.
> > Closer debugging shown that an existing process was on the way out and
> > the oom victim selection took way too long and fired after a large
> > process manage. There were different hacks^Wheuristics to cover those
> > cases but they turned out to just cause different corner cases. Moving
> > the existing last moment allocation after a potentially very time
> > consuming action is relatively cheap and safe measure to cover those
> > cases without any negative side effects I can think of.
> 
> An existing process can exit right after you pull the trigger. How big
> is *that* race window? By this logic you could add a sleep(5) before
> the last-second allocation because it would increase the likelihood of
> somebody else exiting voluntarily.

Please read what I wrote above again. I am not saying this is _closing_
the any race. It however reduces the race window which I find generally
a good thing. Especially when there are no other negative side effects.
 
> This patch is making the time it takes to select a victim an integral
> part of OOM semantics. Think about it: if somebody later speeds up the
> OOM selection process, they shrink the window in which somebody could
> volunteer memory for the last-second allocation. By optimizing that
> code, you're probabilistically increasing the rate of OOM kills.
>
> A guaranteed 5 second window would in fact be better behavior.
> 
> This is bananas. I'm sticking with my nak.

So are you saying that the existing last allocation attempt is more
reasonable? I've tried to remove it [1] and you were against that.

All I'am trying to tell is that _if_ we want to have something like
the last moment allocation after reclaim gave up then it should happen
closer to the killing the actual disruptive operation. The current
attempt in __alloc_pages_may_oom makes only very little sense to me.

[1] http://lkml.kernel.org/r/1454013603-3682-1-git-send-email-mhocko@xxxxxxxxxx
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux