Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 17, 2019 at 8:47 AM Roman Gushchin <guro@xxxxxx> wrote:
>
> Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
> with the hierarchical ones") effectively decreased the precision of
> per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
>
> That's good for displaying in memory.stat, but brings a serious regression
> into the reclaim process.
>
> One issue I've discovered and debugged is the following:
> lruvec_lru_size() can return 0 instead of the actual number of pages
> in the lru list, preventing the kernel to reclaim last remaining
> pages. Result is yet another dying memory cgroups flooding.
> The opposite is also happening: scanning an empty lru list
> is the waste of cpu time.
>
> Also, inactive_list_is_low() can return incorrect values, preventing
> the active lru from being scanned and freed. It can fail both because
> the size of active and inactive lists are inaccurate, and because
> the number of workingset refaults isn't precise. In other words,
> the result is pretty random.
>
> I'm not sure, if using the approximate number of slab pages in
> count_shadow_number() is acceptable, but issues described above
> are enough to partially revert the patch.
>
> Let's keep per-memcg vmstat_local batched (they are only used for
> displaying stats to the userspace), but keep lruvec stats precise.
> This change fixes the dead memcg flooding on my setup.
>

That will make some misunderstanding if the local counters are not in
sync with the hierarchical ones
(someone may doubt whether there're something leaked.).
If we have to do it like this, I think we should better document this behavior.

> Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones")
> Signed-off-by: Roman Gushchin <guro@xxxxxx>
> Cc: Yafang Shao <laoar.shao@xxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
>  mm/memcontrol.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 249187907339..3429340adb56 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -746,15 +746,13 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
>         /* Update memcg */
>         __mod_memcg_state(memcg, idx, val);
>
> +       /* Update lruvec */
> +       __this_cpu_add(pn->lruvec_stat_local->count[idx], val);
> +
>         x = val + __this_cpu_read(pn->lruvec_stat_cpu->count[idx]);
>         if (unlikely(abs(x) > MEMCG_CHARGE_BATCH)) {
>                 struct mem_cgroup_per_node *pi;
>
> -               /*
> -                * Batch local counters to keep them in sync with
> -                * the hierarchical ones.
> -                */
> -               __this_cpu_add(pn->lruvec_stat_local->count[idx], x);
>                 for (pi = pn; pi; pi = parent_nodeinfo(pi, pgdat->node_id))
>                         atomic_long_add(x, &pi->lruvec_stat[idx]);
>                 x = 0;
> --
> 2.21.0
>



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux