Re: [PATCH v2] mm: terminate the reclaim early when direct reclaiming

Zhaoyang Huang <huangzhaoyang@xxxxxxxxx> · Tue, 31 Jul 2018 19:58:20 +0800



On Tue, Jul 31, 2018 at 7:19 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Tue 31-07-18 19:09:28, Zhaoyang Huang wrote:
> > This patch try to let the direct reclaim finish earlier than it used
> > to be. The problem comes from We observing that the direct reclaim
> > took a long time to finish when memcg is enabled. By debugging, we
> > find that the reason is the softlimit is too low to meet the loop
> > end criteria. So we add two barriers to judge if it has reclaimed
> > enough memory as same criteria as it is in shrink_lruvec:
> > 1. for each memcg softlimit reclaim.
> > 2. before starting the global reclaim in shrink_zone.
>
> Then I would really recommend to not use soft limit at all. It has
> always been aggressive. I have propose to make it less so in the past we
> have decided to go that way because we simply do not know whether
> somebody depends on that behavior. Your changelog doesn't really tell
> the whole story. Why is this a problem all of the sudden? Nothing has
> really changed recently AFAICT. Cgroup v1 interface is mostly for
> backward compatibility, we have much better ways to accomplish
> workloads isolation in cgroup v2.
>
> So why does it matter all of the sudden?
>
> Besides that EXPORT_SYMBOL for such a low level functionality as the
> memory reclaim is a big no-no.
>
> So without a much better explanation and with a low level symbol
> exported NAK from me.
>
My test workload is from Android system, where the multimedia apps
require much pages. We observed that one thread of the process trapped
into mem_cgroup_soft_limit_reclaim within direct reclaim and also
blocked other thread in mmap or do_page_fault(by semphore?).
Furthermore, we also observed other long time direct reclaim related
with soft limit which are supposed to cause page thrash as the
allocator itself is the most right of the rb_tree . Besides, even
without the soft_limit, shall the 'direct reclaim' check the watermark
firstly before shrink_node, for the concurrent kswapd may have
reclaimed enough pages for allocation.
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxxxxxx>
> > ---
> >  include/linux/memcontrol.h |  3 ++-
> >  mm/memcontrol.c            |  3 +++
> >  mm/vmscan.c                | 38 +++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 6c6fb11..a7e82c7 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -325,7 +325,8 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg,
> >  void mem_cgroup_uncharge_list(struct list_head *page_list);
> >
> >  void mem_cgroup_migrate(struct page *oldpage, struct page *newpage);
> > -
> > +bool direct_reclaim_reach_watermark(pg_data_t *pgdat, unsigned long nr_reclaimed,
> > +                     unsigned long nr_scanned, gfp_t gfp_mask, int order);
> >  static struct mem_cgroup_per_node *
> >  mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid)
> >  {
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 8c0280b..e4efd46 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2577,6 +2577,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> >                       (next_mz == NULL ||
> >                       loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS))
> >                       break;
> > +             if (direct_reclaim_reach_watermark(pgdat, nr_reclaimed,
> > +                                     *total_scanned, gfp_mask, order))
> > +                     break;
> >       } while (!nr_reclaimed);
> >       if (next_mz)
> >               css_put(&next_mz->memcg->css);
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 03822f8..19503f3 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2518,6 +2518,34 @@ static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
> >               (memcg && memcg_congested(pgdat, memcg));
> >  }
> >
> > +bool direct_reclaim_reach_watermark(pg_data_t *pgdat, unsigned long nr_reclaimed,
> > +             unsigned long nr_scanned, gfp_t gfp_mask,
> > +             int order)
> > +{
> > +     struct scan_control sc = {
> > +             .gfp_mask = gfp_mask,
> > +             .order = order,
> > +             .priority = DEF_PRIORITY,
> > +             .nr_reclaimed = nr_reclaimed,
> > +             .nr_scanned = nr_scanned,
> > +     };
> > +     if (!current_is_kswapd())
> > +             return false;
> > +     if (!IS_ENABLED(CONFIG_COMPACTION))
> > +             return false;
> > +     /*
> > +      * In fact, we add 1 to nr_reclaimed and nr_scanned to let should_continue_reclaim
> > +      * NOT return by finding they are zero, which means compaction_suitable()
> > +      * takes effect here to judge if we have reclaimed enough pages for passing
> > +      * the watermark and no necessary to check other memcg anymore.
> > +      */
> > +     if (!should_continue_reclaim(pgdat,
> > +                             sc.nr_reclaimed + 1, sc.nr_scanned + 1, &sc))
> > +             return true;
> > +     return false;
> > +}
> > +EXPORT_SYMBOL(direct_reclaim_reach_watermark);
> > +
> >  static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
> >  {
> >       struct reclaim_state *reclaim_state = current->reclaim_state;
> > @@ -2802,7 +2830,15 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> >                       sc->nr_scanned += nr_soft_scanned;
> >                       /* need some check for avoid more shrink_zone() */
> >               }
> > -
> > +             /*
> > +              * we maybe have stolen enough pages from soft limit reclaim, so we return
> > +              * back if we are direct reclaim
> > +              */
> > +             if (direct_reclaim_reach_watermark(zone->zone_pgdat, sc->nr_reclaimed,
> > +                                             sc->nr_scanned, sc->gfp_mask, sc->order)) {
> > +                     sc->gfp_mask = orig_mask;
> > +                     return;
> > +             }
> >               /* See comment about same check for global reclaim above */
> >               if (zone->zone_pgdat == last_pgdat)
> >                       continue;
> > --
> > 1.9.1
>
> --
> Michal Hocko
> SUSE Labs