Re: [patch 0/7] improve memcg oom killer robustness v2

"azurIt" <azurit@xxxxxxxx> · Wed, 11 Sep 2013 21:41:18 +0200

>On Wed, Sep 11, 2013 at 08:54:48PM +0200, azurIt wrote:
>> >On Wed, Sep 11, 2013 at 02:33:05PM +0200, azurIt wrote:
>> >> >On Tue, Sep 10, 2013 at 11:32:47PM +0200, azurIt wrote:
>> >> >> >On Tue, Sep 10, 2013 at 11:08:53PM +0200, azurIt wrote:
>> >> >> >> >On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >> >> >> >> Here is full kernel log between 6:00 and 7:59:
>> >> >> >> >> http://watchdog.sk/lkml/kern6.log
>> >> >> >> >
>> >> >> >> >Wow, your apaches are like the hydra.  Whenever one is OOM killed,
>> >> >> >> >more show up!
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> Yeah, it's supposed to do this ;)
>> >> >
>> >> >How are you expecting the machine to recover from an OOM situation,
>> >> >though?  I guess I don't really understand what these machines are
>> >> >doing.  But if you are overloading them like crazy, isn't that the
>> >> >expected outcome?
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> There's no global OOM, server has enough of memory. OOM is occuring only in cgroups (customers who simply don't want to pay for more memory).
>> >
>> >Yes, sure, but when the cgroups are thrashing, they use the disk and
>> >CPU to the point where the overall system is affected.
>> 
>> 
>> 
>> 
>> Didn't know that there is a disk usage because of this, i never noticed anything yet.
>
>You said there was heavy IO going on...?

Yes, there usually was a big IO but it was related to that deadlocking bug in kernel (or i assume it was). I never saw a big IO in normal conditions even when there were lots of OOM in cgroups. I'm even not using swap because of this so i was assuming that lacks of memory is not doing any additional IO (or am i wrong?). And if you mean that last problem with IO from Monday, i don't exactly know what happens but it's really long time when we had so big problem with IO that it disables also root login on console.

>> >Okay, my suspicion is that the previous patches invoked the OOM killer
>> >right away, whereas in this latest version it's invoked only when the
>> >fault is finished.  Maybe the task that locked the group gets held up
>> >somewhere else and then it takes too long until something is actually
>> >killed.  Meanwhile, every other allocator drops into 5 reclaim cycles
>> >before giving up, which could explain the thrashing.  And on the memcg
>> >level we don't have BDI congestion sleeps like on the global level, so
>> >everybody is backing off from the disk.
>> >
>> >Here is an incremental fix to the latest version, i.e. the one that
>> >livelocked under heavy IO, not the one you are using right now.
>> >
>> >First, it reduces the reclaim retries from 5 to 2, which resembles the
>> >global kswapd + ttfp somewhat.  Next, NOFS/NORETRY allocators are not
>> >allowed to kick off the OOM killer, like in the global case, so that
>> >we don't kill things and give up just because light reclaim can't free
>> >anything.  Last, the memcg is marked under OOM when one task enters
>> >OOM so that not everybody is livelocking in reclaim in a hopeless
>> >situation.
>> 
>> 
>> 
>> Thank you i will boot it this night. I also created a new server load checking and recuing script so i hope i won't be forced to hard reboot the server in case something similar as before happens. Btw, patch didn't apply to 3.2.51, there were probably big changes in memory system (almost all hunks failed). I used 3.2.50 as before.
>
>Yes, please don't change the test base in the middle of this!
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>