On 9/2/19 9:16 AM, Michal Hocko wrote:
On Sun 01-09-19 22:43:05, Thomas Lindroth wrote:
After upgrading to the 4.19 series I've started getting problems with
early OOM.
What is the kenrel you have updated from? Would it be possible to try
the current Linus' tree?
I did some more testing and it turns out this is not a regression after all.
I followed up on my hunch and monitored memory.kmem.max_usage_in_bytes while
running cgexec -g memory:12G bash -c 'find / -xdev -type f -print0 | \
xargs -0 -n 1 -P 8 stat > /dev/null'
Just as memory.kmem.max_usage_in_bytes = memory.kmem.limit_in_bytes the OOM
killer kicked in and killed my X server.
Using the find|stat approach it was easy to test the problem in a testing VM.
I was able to reproduce the problem in all these kernels:
4.9.0
4.14.0
4.14.115
4.19.0
5.2.11
5.3-rc6 didn't build in the VM. The build environment is too old probably.
I was curious why I initially couldn't reproduce the problem in 4.14 by
building chromium. I was again able to successfully build chromium using
4.14.115. Turns out memory.kmem.max_usage_in_bytes was 1015689216 after
building and my limit is set to 1073741824. I guess some unrelated change in
memory management raised that slightly for 4.19 triggering the problem.
If you want to reproduce for yourself here are the steps:
1. build any kernel above 4.9 using something like my .config
2. setup a v1 memory cgroup with memory.kmem.limit_in_bytes lower than
memory.limit_in_bytes. I used 100M in my testing VM.
3. Run "find / -xdev -type f -print0 | xargs -0 -n 1 -P 8 stat > /dev/null"
in the cgroup.
4. Assuming there is enough inodes on the rootfs the global OOM killer
should kick in when memory.kmem.max_usage_in_bytes =
memory.kmem.limit_in_bytes and kill something outside the cgroup.