On Mon, Jun 24, 2019 at 4:53 PM Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > > On 21.06.2019 13:14, Yafang Shao wrote: > > There're six different reclaim paths by now, > > - kswapd reclaim path > > - node reclaim path > > - hibernate preallocate memory reclaim path > > - direct reclaim path > > - memcg reclaim path > > - memcg softlimit reclaim path > > > > The slab caches reclaimed in these paths are only calculated in the above > > three paths. > > > > There're some drawbacks if we don't calculate the reclaimed slab caches. > > - The sc->nr_reclaimed isn't correct if there're some slab caches > > relcaimed in this path. > > - The slab caches may be reclaimed thoroughly if there're lots of > > reclaimable slab caches and few page caches. > > Let's take an easy example for this case. > > If one memcg is full of slab caches and the limit of it is 512M, in > > other words there're approximately 512M slab caches in this memcg. > > Then the limit of the memcg is reached and the memcg reclaim begins, > > and then in this memcg reclaim path it will continuesly reclaim the > > slab caches until the sc->priority drops to 0. > > After this reclaim stops, you will find there're few slab caches left, > > which is less than 20M in my test case. > > While after this patch applied the number is greater than 300M and > > the sc->priority only drops to 3. > > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > --- > > mm/vmscan.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 18a66e5..d6c3fc8 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -3164,11 +3164,13 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order, > > if (throttle_direct_reclaim(sc.gfp_mask, zonelist, nodemask)) > > return 1; > > > > + current->reclaim_state = &sc.reclaim_state; > > trace_mm_vmscan_direct_reclaim_begin(order, sc.gfp_mask); > > > > nr_reclaimed = do_try_to_free_pages(zonelist, &sc); > > > > trace_mm_vmscan_direct_reclaim_end(nr_reclaimed); > > + current->reclaim_state = NULL; > > Shouldn't we remove reclaim_state assignment from __perform_reclaim() after this? > Oh yes. We should remove it. Thanks for pointing out. I will post a fix soon. Thanks Yafang > > return nr_reclaimed; > > } > > @@ -3191,6 +3193,7 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, > > }; > > unsigned long lru_pages; > > > > + current->reclaim_state = &sc.reclaim_state; > > sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) | > > (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK); > > > > @@ -3212,7 +3215,9 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, > > cgroup_ino(memcg->css.cgroup), > > sc.nr_reclaimed); > > > > + current->reclaim_state = NULL; > > *nr_scanned = sc.nr_scanned; > > + > > return sc.nr_reclaimed; > > } > > > > @@ -3239,6 +3244,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > > .may_shrinkslab = 1, > > }; > > > > + current->reclaim_state = &sc.reclaim_state; > > /* > > * Unlike direct reclaim via alloc_pages(), memcg's reclaim doesn't > > * take care of from where we get pages. So the node where we start the > > @@ -3263,6 +3269,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > > trace_mm_vmscan_memcg_reclaim_end( > > cgroup_ino(memcg->css.cgroup), > > nr_reclaimed); > > + current->reclaim_state = NULL; > > > > return nr_reclaimed; > > } > > >