On Sun, Jan 5, 2020 at 5:23 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sat, Jan 04, 2020 at 03:26:13PM +0800, Yafang Shao wrote: > > On Sat, Jan 4, 2020 at 11:36 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > On Tue, Dec 24, 2019 at 02:53:25AM -0500, Yafang Shao wrote: > > > > The lru walker isolation function may use this memcg to do something, e.g. > > > > the inode isolatation function will use the memcg to do inode protection in > > > > followup patch. So make memcg visible to the lru walker isolation function. > > > > > > > > Something should be emphasized in this patch is it replaces > > > > for_each_memcg_cache_index() with for_each_mem_cgroup() in > > > > list_lru_walk_node(). Because there's a gap between these two MACROs that > > > > for_each_mem_cgroup() depends on CONFIG_MEMCG while the other one depends > > > > on CONFIG_MEMCG_KMEM. But as list_lru_memcg_aware() returns false if > > > > CONFIG_MEMCG_KMEM is not configured, it is safe to this replacement. > > > > > > > > Cc: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx> > > > > > > .... > > > > > > > @@ -299,17 +299,15 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid, > > > > list_lru_walk_cb isolate, void *cb_arg, > > > > unsigned long *nr_to_walk) > > > > { > > > > + struct mem_cgroup *memcg; > > > > long isolated = 0; > > > > - int memcg_idx; > > > > > > > > - isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, > > > > - nr_to_walk); > > > > - if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) { > > > > - for_each_memcg_cache_index(memcg_idx) { > > > > + if (list_lru_memcg_aware(lru)) { > > > > + for_each_mem_cgroup(memcg) { > > > > struct list_lru_node *nlru = &lru->node[nid]; > > > > > > > > spin_lock(&nlru->lock); > > > > - isolated += __list_lru_walk_one(nlru, memcg_idx, > > > > + isolated += __list_lru_walk_one(nlru, memcg, > > > > isolate, cb_arg, > > > > nr_to_walk); > > > > spin_unlock(&nlru->lock); > > > > @@ -317,7 +315,11 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid, > > > > if (*nr_to_walk <= 0) > > > > break; > > > > } > > > > + } else { > > > > + isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, > > > > + nr_to_walk); > > > > } > > > > + > > > > > > That's a change of behaviour. The old code always runs per-node > > > reclaim, then if the LRU is memcg aware it also runs the memcg > > > aware reclaim. The new code never runs global per-node reclaim > > > if the list is memcg aware, so shrinkers that are initialised > > > with the flags SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE seem > > > likely to have reclaim problems with mixed memcg/global memory > > > pressure scenarios. > > > > > > e.g. if all the memory is in the per-node lists, and the memcg needs > > > to reclaim memory because of a global shortage, it is now unable to > > > reclaim global memory..... > > > > > > > Hi Dave, > > > > Thanks for your detailed explanation. > > But I have different understanding. > > The difference between for_each_mem_cgroup(memcg) and > > for_each_memcg_cache_index(memcg_idx) is that the > > for_each_mem_cgroup() includes the root_mem_cgroup while the > > for_each_memcg_cache_index() excludes the root_mem_cgroup because the > > memcg_idx of it is -1. > > Except that the "root" memcg that for_each_mem_cgroup() is not the > "global root" memcg - it is whatever memcg that is passed down in > the shrink_control, whereever that sits in the cgroup tree heirarchy. > do_shrink_slab() only ever passes down the global root memcg to the > shrinkers when the global root memcg is passed to shrink_slab(), and > that does not iterate the memcg heirarchy - it just wants to > reclaim from global caches an non-memcg aware shrinkers. > > IOWs, there are multiple changes in behaviour here - memcg specific > reclaim won't do global reclaim, and global reclaim will now iterate > all memcgs instead of just the global root memcg. > > > So it can reclaim global memory even if the list is memcg aware. > > Is that right ? > > If the memcg passed to this fucntion is the root memcg, then yes, > it will behave as you suggest. But for the majority of memcg-context > reclaim, the memcg is not the root memcg and so they will not do > global reclaim anymore... > Thanks for you reply. But I have to clairfy that this change is in list_lru_walk_node(), and the memcg is not passed to this funtion from shrink_control. In order to make it more clear, I paste the function here. - The new function unsigned long list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { struct mem_cgroup *memcg; <<<< A local variable long isolated = 0; if (list_lru_memcg_aware(lru)) { for_each_mem_cgroup(memcg) { <<<< scan all MEMCGs, including root_mem_cgroup struct list_lru_node *nlru = &lru->node[nid]; spin_lock(&nlru->lock); isolated += __list_lru_walk_one(nlru, memcg, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); if (*nr_to_walk <= 0) break; } } else { <<<< scan global memory only (root_mem_cgroup) isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, nr_to_walk); } return isolated; } - While the original function is, unsigned long list_lru_walk_node(struct list_lru *lru, int nid, list_lru_walk_cb isolate, void *cb_arg, unsigned long *nr_to_walk) { long isolated = 0; int memcg_idx; isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg, nr_to_walk); <<<< scan global memory only (root_mem_cgroup) if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) { for_each_memcg_cache_index(memcg_idx) { <<<< scan all MEMCGs excludes root_mem_cgroup struct list_lru_node *nlru = &lru->node[nid]; spin_lock(&nlru->lock); isolated += __list_lru_walk_one(nlru, memcg_idx, isolate, cb_arg, nr_to_walk); spin_unlock(&nlru->lock); if (*nr_to_walk <= 0) break; } } return isolated; } Is that right ? Thanks Yafang