Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+

Michal Hocko <mhocko@xxxxxxxxxx> · Thu, 5 Jan 2017 13:33:42 +0100

On Wed 04-01-17 17:30:37, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 21 Dec 2016 19:56:16 +0000 bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=190841
> > 
> >             Bug ID: 190841
> >            Summary: [REGRESSION] Intensive Memory CGroup removal leads to
> >                     high load average 10+
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 4.7.0-rc1+
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@xxxxxxxxxxxxxxxxxxxx
> >           Reporter: frolvlad@xxxxxxxxx
> >         Regression: No
> > 
> > My simplified workflow looks like this:
> > 
> > 1. Create a Memory CGroup with memory limit
> > 2. Exec a child process
> > 3. Add the child process PID into the Memory CGroup
> > 4. Wait for the child process to finish
> > 5. Remove the Memory CGroup
> > 
> > The child processes usually run less than 0.1 seconds, but I have lots of them.
> > Normally, I could run over 10000 child processes per minute, but with newer
> > kernels, I can only do 400-500 executions per minute, and my system becomes
> > extremely sluggish (the only indicator of the weirdness I found is an unusually
> > high load average, which sometimes goes over 250!).

Well, yes, rmdir is not the cheapest operation... Since b2052564e66d
("mm: memcontrol: continue cache reclaim from offlined groups") we are
postponing the real memcg removal to later, when there is a memory
pressure. 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
after many small jobs") fixed unbound id space consumption. I would be
quite surprised if this caused a new regression. But the report says
that this is 4.7+ thing. I would expect older kernels would just refuse
the create new cgroups... Maybe that happens in your script and just
gets unnoticed?

We might come up with some more harderning in the offline path (e.g.
count the number of dead memcgs and force their reclaim after some
number gets accumulated). But all that just adds more code and risk of
regression for something that is not used very often. Cgroups
creation/destruction are too heavy operations to be done for very
shortlived process. Even without memcg involved. Are there any strong
reasons you cannot reuse an existing cgroup?

> > Here is a simple reproduction script:
> > 
> > #!/bin/sh
> > CGROUP_BASE=/sys/fs/cgroup/memory/qq
> > 
> > for $i in $(seq 1000); do
> >     echo "Iteration #$i"
> >     sh -c "
> >         mkdir '$CGROUP_BASE'
> >         sh -c 'echo \$$ > $CGROUP_BASE/tasks ; sleep 0.0'

one possible workaround would be to do
            echo 1 > $CGROUP_BASE/memory.force_empty

before you remove the cgroup. That should drop the existing charges - at
least for the page cache which might be what keeps those memcgs alive.

> >         rmdir '$CGROUP_BASE' || true
> >     "
> > done
> > # ===

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>