Re: [PATCH] mm, memcg: Optionally disable memcg by default using Kconfig

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 19, 2015 at 05:04:04PM +0100, Mel Gorman wrote:
> On Tue, May 19, 2015 at 04:41:19PM +0100, Mel Gorman wrote:
> > On Tue, May 19, 2015 at 05:27:10PM +0200, Michal Hocko wrote:
> > > On Tue 19-05-15 16:13:02, Mel Gorman wrote:
> > > [...]
> > > >                :ffffffff811c160f:       je     ffffffff811c1630 <mem_cgroup_try_charge+0x40>
> > > >                :ffffffff811c1611:       xor    %eax,%eax
> > > >                :ffffffff811c1613:       xor    %ebx,%ebx
> > > >      1 1.7e-05 :ffffffff811c1615:       mov    %rbx,(%r12)
> > > >      7 1.2e-04 :ffffffff811c1619:       add    $0x10,%rsp
> > > >   1211  0.0203 :ffffffff811c161d:       pop    %rbx
> > > >      5 8.4e-05 :ffffffff811c161e:       pop    %r12
> > > >      5 8.4e-05 :ffffffff811c1620:       pop    %r13
> > > >   1249  0.0210 :ffffffff811c1622:       pop    %r14
> > > >      7 1.2e-04 :ffffffff811c1624:       pop    %rbp
> > > >      5 8.4e-05 :ffffffff811c1625:       retq   
> > > >                :ffffffff811c1626:       nopw   %cs:0x0(%rax,%rax,1)
> > > >    295  0.0050 :ffffffff811c1630:       mov    (%rdi),%rax
> > > > 160703  2.6973 :ffffffff811c1633:       mov    %edx,%r13d
> > > 
> > > Huh, what? Even if this was off by one and the preceding instruction has
> > > consumed the time. This would be reading from page->flags but the page
> > > should be hot by the time we got here, no?
> > > 
> > 
> > I would have expected so but it's not the first time I've seen cases where
> > examining the flags was a costly instruction. I suspect it's due to an
> > ordering issue or more likely, a frequent branch mispredict that is being
> > accounted for against this instruction.
> > 
> 
> Which is plausible as forward branches are statically predicted false but
> in this particular load that could be a close to a 100% mispredict.
> 

Plausible but wrong. The responsible instruction was too far away so it
looks more like an ordering issue where the PageSwapCache check must be
ordered against the setting of page up to date. __SetPageUptodate is a
barrier that is necessary before the PTE is established and visible but it
does not have to be ordered against the memcg charging. In fact it makes
sense to do it afterwards in case the charge fails and the page is never
visible. Just adjusting that reduces the cost to

/usr/src/linux-4.0-chargefirst-v1r1/mm/memcontrol.c                  3.8547   228233
  __mem_cgroup_count_vm_event                                                  1.172%    69393
  mem_cgroup_page_lruvec                                                       0.464%    27456
  mem_cgroup_commit_charge                                                     0.390%    23072
  uncharge_list                                                                0.327%    19370
  mem_cgroup_update_lru_size                                                   0.284%    16831
  get_mem_cgroup_from_mm                                                       0.262%    15523
  mem_cgroup_try_charge                                                        0.256%    15147
  memcg_check_events                                                           0.222%    13120
  mem_cgroup_charge_statistics.isra.22                                         0.194%    11470
  commit_charge                                                                0.145%     8615
  try_charge                                                                   0.139%     8236

Big sinner there is updating per-cpu stats -- root cgroup stats I assume? To
refresh, a complete disable looks like

/usr/src/linux-4.0-nomemcg-v1r1/mm/memcontrol.c                      0.4834    27511
  mem_cgroup_page_lruvec                                                       0.161%     9172
  mem_cgroup_update_lru_size                                                   0.154%     8794
  mem_cgroup_try_charge                                                        0.126%     7194
  mem_cgroup_commit_charge                                                     0.041%     2351

Still, 6.64% down to 3.85% is better than a kick in the head. Unprofiled
performance looks like

pft faults
                                       4.0.0                  4.0.0                 4.0.0
                                     vanilla             nomemcg-v1        chargefirst-v1
Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1530574.6033 (  6.05%) 1487623.0037 (  3.07%)
Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1375156.5834 (  2.59%) 1351401.2578 (  0.82%)
Hmean    faults/cpu-5  875599.0222 (  0.00%)  876217.9211 (  0.07%)  876122.6489 (  0.06%)
Hmean    faults/cpu-7  601146.6726 (  0.00%)  599068.4360 ( -0.35%)  600944.9229 ( -0.03%)
Hmean    faults/cpu-8  510728.2754 (  0.00%)  509887.9960 ( -0.16%)  510906.3818 (  0.03%)
Hmean    faults/sec-1 1432084.7845 (  0.00%) 1518566.3541 (  6.04%) 1475994.2194 (  3.07%)
Hmean    faults/sec-3 3943818.1437 (  0.00%) 4036918.0217 (  2.36%) 3973070.2159 (  0.74%)
Hmean    faults/sec-5 3877573.5867 (  0.00%) 3922745.9207 (  1.16%) 3891705.1749 (  0.36%)
Hmean    faults/sec-7 3991832.0418 (  0.00%) 3990670.8481 ( -0.03%) 3989110.4674 ( -0.07%)
Hmean    faults/sec-8 3987189.8167 (  0.00%) 3978842.8107 ( -0.21%) 3981011.2936 ( -0.15%)

Very minor boost. The same reordering looks like it would also suit
do_wp_page. I'll do that, retest, put some lipstick on the patches and
post them tomorrow the day after. The reordering one probably makes sense
anyway, the default disabling of memcg still has merit but maybe if that
charging of the root group can be eliminated then it'd be pointless.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]