Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+

Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> · Fri, 6 Jan 2017 19:28:27 +0300

Hello,

The issue does look like kmemcg related - see below.

On Wed, Jan 04, 2017 at 05:30:37PM -0800, Andrew Morton wrote:

> > * Ubuntu 4.4.0-57 kernel works fine
> > * Mainline 4.4.39 and below seem to work just fine -
> > https://youtu.be/tGD6sfwa-3c

kmemcg is disabled

> > * Mainline 4.6.7 kernel behaves seminormal, load average is higher than on 4.4,
> > but not as bad as on 4.7+ - https://youtu.be/-CyhmkkPbKE

4.6+

b313aeee25098 mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy

kmemcg is enabled by default for all cgroups, which introduces extra
overhead to memcg destruction path

> > * Mainline 4.7.0-rc1 kernel is the first kernel after 4.6.7 that is available
> > in binaries, so I chose to test it and it doesn't play nicely -
> > https://youtu.be/C_J5es74Ars

4.7+

81ae6d03952c1 mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink()

kick_all_cpus_sync(), which was used for synchronizing slub cache
destruction before this commit, turns out to be too disruptive on big
SMP machines as it generates a lot of IPIs, so it is replaced with more
lightweight synchronize_sched(). The latter, however, blocks cgroup
rmdir under the slab_mutex for relatively long, resulting in higher load
average as well as stalling other processes trying to create or destroy
a kmem cache.

> > * Mainline 4.9.0 kernel still doesn't play nicely -
> > https://youtu.be/_o17U5x3bmY

The above-mentioned issue is still unfixed.

> > 
> > OTHER NOTES:
> > 1. Using VirtualBox I have noticed that this bug only reproducible when I have
> > 2+ CPU cores!

synchronize_sched() is a no-op on UP machines, which explains why on a
UP machine the problems goes away.

If I'm correct, the issue must have been fixed in 4.10, which is yet to
be released:

89e364db71fb5 slub: move synchronize_sched out of slab_mutex on shrink

You can workaround it on older kernels by turning kmem accounting off.
To do that, append 'cgroup.memory=nokmem' to the kernel command line.
Alternatively, you can try to recompile the kernel choosing SLAB as the
slab allocator, because only SLUB is affected IIRC.

FWIW I tried the script you provided in a 4 CPU VM running 4.10-rc2 and
didn't notice any significant stalls or latency spikes. Could you please
check if this kernel fixes your problem? If it does it might be worth
submitting the patch to stable..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>