Re: [Bug 199297] New: OOMs writing to files from processes with cgroup memory limits

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Fri, 6 Apr 2018 13:36:00 -0700

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 05 Apr 2018 21:55:26 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=199297
> 
>             Bug ID: 199297
>            Summary: OOMs writing to files from processes with cgroup
>                     memory limits
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.11+
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Other
>           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
>           Reporter: cbehrens@xxxxxxxxxxxx
>         Regression: No
> 
> Created attachment 275113
>   --> https://bugzilla.kernel.org/attachment.cgi?id=275113&action=edit
> script to reproduce issue + kernel config + oom log from dmesg
> 
> OVERVIEW:
> 
> Processes that have a cgroup memory limit can easily OOM just writing to files.
> It appears there is no throttling and the process can very quickly exceed the
> cgroup memory limit. vm.dirty_ratio appears to be applied to global available
> memory and not available memory for the cgroup, at least in my case.
> 
> This issue came to light by using kubernetes and putting memory limits on pods,
> but is completely reproducible stand-alone.
> 
> STEPS TO REPRODUCE:
> 
> * create a memory cgroup
> * put a memory limit on the cgroup.. say 256M.
> * add a shell process to the cgroup
> * use dd from that shell to write a bunch of data to a file
> 
> See attached simple script that reproduces the issue every time.
> 
> dd will end up getting OOMkilled very shortly after starting. OOM logging will
> show dirty pages above the cgroup limit.
> 
> Kernels before 4.11 do not see this behavior. I've tracked the issue to the
> following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/?id=726d061fbd3658e4bfeffa1b8e82da97de2ca4dd
> 
> When I reverse this commit, dd will complete successfully.
> 
> I believe there to be a larger issue here, though. I did a bit of debugging. As
> mentioned above, it doesn't appear there's any throttling. It also doesn't
> appear that writebacks are fired when dirty pages build up for the cgroup. From
> what I can tell, vm.dirty* configs would only get applied against the cgroup
> limit IFF inode_cgwb_enabled(inode) returns true in
> balance_dirty_pages_ratelimited(). That appears to be returning false for me.
> I'm using ext4 and CONFIG_CGROUP_WRITEBACK is enabled. The code removed from
> the above commit seems to be saving things by reclaiming during try_charge().
> But from what I can tell, we actually want to throttle in
> balance_dirty_pages(), instead.. but that's not happening. This code is all
> foreign to me, but just wanted to dump a bit about what I saw from my
> debugging.
> 
> NOTE: If I set vm.dirty_bytes to a value lower than my cgroup memory limit, I
> no longer see OOMs... as it appears the process gets throttled correctly.
> 
> I'm attaching a script to reproduce the issue, kernel config, and OOM log
> messages from dmesg for kernel 4.15.0.
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.