Re: [PATCH] mm, memcg: Optionally disable memcg by default using Kconfig

Mel Gorman <mgorman@xxxxxxx> · Tue, 19 May 2015 15:43:45 +0100

On Tue, May 19, 2015 at 10:18:07AM -0400, Johannes Weiner wrote:
> CC'ing Tejun and cgroups for the generic cgroup interface part
> 
> On Tue, May 19, 2015 at 11:40:57AM +0100, Mel Gorman wrote:
> > memcg was reported years ago to have significant overhead when unused. It
> > has improved but it's still the case that users that have no knowledge of
> > memcg pay a performance penalty.
> > 
> > This patch adds a Kconfig that controls whether memcg is enabled by default
> > and a kernel parameter cgroup_enable= to enable it if desired. Anyone using
> > oldconfig will get the historical behaviour. It is not an option for most
> > distributions to simply disable MEMCG as there are users that require it
> > but they should also be knowledgable enough to use cgroup_enable=.
> > 
> > This was evaluated using aim9, a page fault microbenchmark and ebizzy
> > but I'll focus on the page fault microbenchmark. It can be reproduced
> > using pft from mmtests (https://github.com/gormanm/mmtests).  Edit
> > configs/config-global-dhp__pagealloc-performance and update MMTESTS to
> > only contain pft. This is the relevant part of the profile summary
> > 
> > /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
> >   mem_cgroup_try_charge                                                        2.950%   175781
> 
> Ouch.  Do you have a way to get the per-instruction breakdown of this?

Not that I can upload in a reasonable amount of time. An annotated profile
and vmlinux image for decoding addresses is not small. My expectation is
that it'd be trivially reproducible.

> This function really isn't doing much.  I'll try to reproduce it here
> too, I haven't seen such high costs with pft in the past.
> 

I don't believe it's the machine that is being particularly stupid. It's
a fairly bog-standard desktop class box.

> >   __mem_cgroup_count_vm_event                                                  1.431%    85239
> >   mem_cgroup_page_lruvec                                                       0.456%    27156
> >   mem_cgroup_commit_charge                                                     0.392%    23342
> >   uncharge_list                                                                0.323%    19256
> >   mem_cgroup_update_lru_size                                                   0.278%    16538
> >   memcg_check_events                                                           0.216%    12858
> >   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
> >   try_charge                                                                   0.150%     8928
> >   commit_charge                                                                0.141%     8388
> >   get_mem_cgroup_from_mm                                                       0.121%     7184
> > 
> > It's showing 6.64% overhead in memcontrol.c when no memcgs are in
> > use. Applying the patch and disabling memcg reduces this to 0.48%
> 
> The frustrating part is that 4.5% of that is not even coming from the
> main accounting and tracking work.  I'm looking into getting this
> fixed regardless of what happens with this patch.
> 
> > /usr/src/linux-4.0-nomemcg-v1r1/mm/memcontrol.c                      0.4834    27511
> >   mem_cgroup_page_lruvec                                                       0.161%     9172
> >   mem_cgroup_update_lru_size                                                   0.154%     8794
> >   mem_cgroup_try_charge                                                        0.126%     7194
> >   mem_cgroup_commit_charge                                                     0.041%     2351
> > 
> > Note that it's not very visible from headline performance figures
> > 
> > pft faults
> >                                        4.0.0                  4.0.0
> >                                      vanilla             nomemcg-v1
> > Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1530574.6033 (  6.05%)
> > Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1375156.5834 (  2.59%)
> > Hmean    faults/cpu-5  875599.0222 (  0.00%)  876217.9211 (  0.07%)
> > Hmean    faults/cpu-7  601146.6726 (  0.00%)  599068.4360 ( -0.35%)
> > Hmean    faults/cpu-8  510728.2754 (  0.00%)  509887.9960 ( -0.16%)
> > Hmean    faults/sec-1 1432084.7845 (  0.00%) 1518566.3541 (  6.04%)
> > Hmean    faults/sec-3 3943818.1437 (  0.00%) 4036918.0217 (  2.36%)
> > Hmean    faults/sec-5 3877573.5867 (  0.00%) 3922745.9207 (  1.16%)
> > Hmean    faults/sec-7 3991832.0418 (  0.00%) 3990670.8481 ( -0.03%)
> > Hmean    faults/sec-8 3987189.8167 (  0.00%) 3978842.8107 ( -0.21%)
> > 
> > Low thread counts get a boost but it's within noise as memcg overhead does
> > not dominate.  It's not obvious at all at higher thread counts as other
> > factors cause more problems. The overall breakdown of CPU usage looks like
> > 
> >                4.0.0       4.0.0
> >              vanilla  nomemcg-v1
> > User           41.45       41.11
> > System        410.19      404.76
> > Elapsed       130.33      126.30
> > 
> > Despite the relative unimportance, there is at least some justification
> > for disabling memcg by default.
> 
> I guess so.  The only thing I don't like about this is that it changes
> the default of a single controller.  While there is some justification
> from an overhead standpoint, it's a little weird in terms of interface
> when you boot, say, a distribution kernel and it has cgroups with all
> but one resource controller available.
> 
> Would it make more sense to provide a Kconfig option that disables all
> resource controllers per default?  There is still value in having only
> the generic cgroup part for grouped process monitoring and control.
> 

A config option per controller seems overkill because AFAIK the other
controllers are harmless in terms of overhead. All enabled or all
disabled has other consequences because AFAIK systemd requires some
controllers to function correctly -- e.g.
https://bugs.freedesktop.org/show_bug.cgi?id=74589

After I wrote the patch, I spotted that Debian apparently already
does something like this and by coincidence they matched the
parameter name and values. See the memory controller instructions on
https://wiki.debian.org/LXC#Prepare_the_host . So in this case at least
upstream would match something that at least one distro in the field
already uses.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>