Re: [PATCH] mm/memcontrol: make the local VM stats consistent with total stats

Yafang Shao <laoar.shao@xxxxxxxxx> · Thu, 11 Jul 2019 09:10:51 +0800

On Thu, Jul 11, 2019 at 4:38 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Jul 10, 2019 at 05:27:03AM -0400, Yafang Shao wrote:
> > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"),
> > the local VM stats is not consistent with total VM stats.
> >
> > Bellow is one example on my server (with 8 CPUs),
> >       inactive_file 3567570944
> >       total_inactive_file 3568029696
> >
> > We can find that the deviation is very great, that is because the 'val' in
> > __mod_memcg_state() is in pages while the effective value
> > in memcg_stat_show() is in bytes.
> > So the maximum of this deviation between local VM stats and total VM
> > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable
> > great value.
> >
> > We should make the local VM stats consistent with the total stats.
> > Although the deviation between local VM events and total events are not
> > great, I think we'd better make them consistent with each other as well.
>
> Ha - the local stats are not percpu-fuzzy enough... But I guess that
> is a valid complaint.
>
> > ---
> >  mm/memcontrol.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index ba9138a..a9448c3 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -691,12 +691,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val)
> >       if (mem_cgroup_disabled())
> >               return;
> >
> > -     __this_cpu_add(memcg->vmstats_local->stat[idx], val);
> >
> >       x = val + __this_cpu_read(memcg->vmstats_percpu->stat[idx]);
> >       if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
> >               struct mem_cgroup *mi;
> >
> > +             __this_cpu_add(memcg->vmstats_local->stat[idx], x);
> >               for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
> >                       atomic_long_add(x, &mi->vmstats[idx]);
> >               x = 0;
> > @@ -773,12 +773,12 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
> >       if (mem_cgroup_disabled())
> >               return;
> >
> > -     __this_cpu_add(memcg->vmstats_local->events[idx], count);
> >
> >       x = count + __this_cpu_read(memcg->vmstats_percpu->events[idx]);
> >       if (unlikely(x > MEMCG_CHARGE_BATCH)) {
> >               struct mem_cgroup *mi;
> >
> > +             __this_cpu_add(memcg->vmstats_local->events[idx], x);
> >               for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
> >                       atomic_long_add(x, &mi->vmevents[idx]);
> >               x = 0;
>
> Please also update __mod_lruvec_state() to keep this behavior the same
> across counters, to make sure we won't have any surprises when
> switching between them.
>
> And please add comments explaining that we batch local counters to
> keep them in sync with the hierarchical ones. Because it does look a
> little odd without explanation.

Sure, I will do it.

Thanks
Yafang