Re: [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function

Dave Chinner <david@xxxxxxxxxxxxx> · Sat, 4 Jan 2020 14:35:58 +1100

On Tue, Dec 24, 2019 at 02:53:25AM -0500, Yafang Shao wrote:
> The lru walker isolation function may use this memcg to do something, e.g.
> the inode isolatation function will use the memcg to do inode protection in
> followup patch. So make memcg visible to the lru walker isolation function.
> 
> Something should be emphasized in this patch is it replaces
> for_each_memcg_cache_index() with for_each_mem_cgroup() in
> list_lru_walk_node(). Because there's a gap between these two MACROs that
> for_each_mem_cgroup() depends on CONFIG_MEMCG while the other one depends
> on CONFIG_MEMCG_KMEM. But as list_lru_memcg_aware() returns false if
> CONFIG_MEMCG_KMEM is not configured, it is safe to this replacement.
> 
> Cc: Dave Chinner <dchinner@xxxxxxxxxx>
> Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>

....

> @@ -299,17 +299,15 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
>  				 list_lru_walk_cb isolate, void *cb_arg,
>  				 unsigned long *nr_to_walk)
>  {
> +	struct mem_cgroup *memcg;
>  	long isolated = 0;
> -	int memcg_idx;
>  
> -	isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> -				      nr_to_walk);
> -	if (*nr_to_walk > 0 && list_lru_memcg_aware(lru)) {
> -		for_each_memcg_cache_index(memcg_idx) {
> +	if (list_lru_memcg_aware(lru)) {
> +		for_each_mem_cgroup(memcg) {
>  			struct list_lru_node *nlru = &lru->node[nid];
>  
>  			spin_lock(&nlru->lock);
> -			isolated += __list_lru_walk_one(nlru, memcg_idx,
> +			isolated += __list_lru_walk_one(nlru, memcg,
>  							isolate, cb_arg,
>  							nr_to_walk);
>  			spin_unlock(&nlru->lock);
> @@ -317,7 +315,11 @@ unsigned long list_lru_walk_node(struct list_lru *lru, int nid,
>  			if (*nr_to_walk <= 0)
>  				break;
>  		}
> +	} else {
> +		isolated += list_lru_walk_one(lru, nid, NULL, isolate, cb_arg,
> +					      nr_to_walk);
>  	}
> +

That's a change of behaviour. The old code always runs per-node
reclaim, then if the LRU is memcg aware it also runs the memcg
aware reclaim. The new code never runs global per-node reclaim
if the list is memcg aware, so shrinkers that are initialised
with the flags SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE seem
likely to have reclaim problems with mixed memcg/global memory
pressure scenarios.

e.g. if all the memory is in the per-node lists, and the memcg needs
to reclaim memory because of a global shortage, it is now unable to
reclaim global memory.....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx