On Tue, Jul 31, 2018 at 7:19 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Tue 31-07-18 19:09:28, Zhaoyang Huang wrote: > > This patch try to let the direct reclaim finish earlier than it used > > to be. The problem comes from We observing that the direct reclaim > > took a long time to finish when memcg is enabled. By debugging, we > > find that the reason is the softlimit is too low to meet the loop > > end criteria. So we add two barriers to judge if it has reclaimed > > enough memory as same criteria as it is in shrink_lruvec: > > 1. for each memcg softlimit reclaim. > > 2. before starting the global reclaim in shrink_zone. > > Then I would really recommend to not use soft limit at all. It has > always been aggressive. I have propose to make it less so in the past we > have decided to go that way because we simply do not know whether > somebody depends on that behavior. Your changelog doesn't really tell > the whole story. Why is this a problem all of the sudden? Nothing has > really changed recently AFAICT. Cgroup v1 interface is mostly for > backward compatibility, we have much better ways to accomplish > workloads isolation in cgroup v2. > > So why does it matter all of the sudden? > > Besides that EXPORT_SYMBOL for such a low level functionality as the > memory reclaim is a big no-no. > > So without a much better explanation and with a low level symbol > exported NAK from me. > My test workload is from Android system, where the multimedia apps require much pages. We observed that one thread of the process trapped into mem_cgroup_soft_limit_reclaim within direct reclaim and also blocked other thread in mmap or do_page_fault(by semphore?). Furthermore, we also observed other long time direct reclaim related with soft limit which are supposed to cause page thrash as the allocator itself is the most right of the rb_tree . Besides, even without the soft_limit, shall the 'direct reclaim' check the watermark firstly before shrink_node, for the concurrent kswapd may have reclaimed enough pages for allocation. > > > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxxxxxx> > > --- > > include/linux/memcontrol.h | 3 ++- > > mm/memcontrol.c | 3 +++ > > mm/vmscan.c | 38 +++++++++++++++++++++++++++++++++++++- > > 3 files changed, 42 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 6c6fb11..a7e82c7 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -325,7 +325,8 @@ void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg, > > void mem_cgroup_uncharge_list(struct list_head *page_list); > > > > void mem_cgroup_migrate(struct page *oldpage, struct page *newpage); > > - > > +bool direct_reclaim_reach_watermark(pg_data_t *pgdat, unsigned long nr_reclaimed, > > + unsigned long nr_scanned, gfp_t gfp_mask, int order); > > static struct mem_cgroup_per_node * > > mem_cgroup_nodeinfo(struct mem_cgroup *memcg, int nid) > > { > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 8c0280b..e4efd46 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2577,6 +2577,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, > > (next_mz == NULL || > > loop > MEM_CGROUP_MAX_SOFT_LIMIT_RECLAIM_LOOPS)) > > break; > > + if (direct_reclaim_reach_watermark(pgdat, nr_reclaimed, > > + *total_scanned, gfp_mask, order)) > > + break; > > } while (!nr_reclaimed); > > if (next_mz) > > css_put(&next_mz->memcg->css); > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 03822f8..19503f3 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -2518,6 +2518,34 @@ static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg) > > (memcg && memcg_congested(pgdat, memcg)); > > } > > > > +bool direct_reclaim_reach_watermark(pg_data_t *pgdat, unsigned long nr_reclaimed, > > + unsigned long nr_scanned, gfp_t gfp_mask, > > + int order) > > +{ > > + struct scan_control sc = { > > + .gfp_mask = gfp_mask, > > + .order = order, > > + .priority = DEF_PRIORITY, > > + .nr_reclaimed = nr_reclaimed, > > + .nr_scanned = nr_scanned, > > + }; > > + if (!current_is_kswapd()) > > + return false; > > + if (!IS_ENABLED(CONFIG_COMPACTION)) > > + return false; > > + /* > > + * In fact, we add 1 to nr_reclaimed and nr_scanned to let should_continue_reclaim > > + * NOT return by finding they are zero, which means compaction_suitable() > > + * takes effect here to judge if we have reclaimed enough pages for passing > > + * the watermark and no necessary to check other memcg anymore. > > + */ > > + if (!should_continue_reclaim(pgdat, > > + sc.nr_reclaimed + 1, sc.nr_scanned + 1, &sc)) > > + return true; > > + return false; > > +} > > +EXPORT_SYMBOL(direct_reclaim_reach_watermark); > > + > > static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) > > { > > struct reclaim_state *reclaim_state = current->reclaim_state; > > @@ -2802,7 +2830,15 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) > > sc->nr_scanned += nr_soft_scanned; > > /* need some check for avoid more shrink_zone() */ > > } > > - > > + /* > > + * we maybe have stolen enough pages from soft limit reclaim, so we return > > + * back if we are direct reclaim > > + */ > > + if (direct_reclaim_reach_watermark(zone->zone_pgdat, sc->nr_reclaimed, > > + sc->nr_scanned, sc->gfp_mask, sc->order)) { > > + sc->gfp_mask = orig_mask; > > + return; > > + } > > /* See comment about same check for global reclaim above */ > > if (zone->zone_pgdat == last_pgdat) > > continue; > > -- > > 1.9.1 > > -- > Michal Hocko > SUSE Labs