Re: [PATCH] mm: memcg: fix stale protection of reclaim target memcg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 22, 2022 at 11:27:21PM +0000, Yosry Ahmed wrote:
> During reclaim, mem_cgroup_calculate_protection() is used to determine
> the effective protection (emin and elow) values of a memcg. The
> protection of the reclaim target is ignored, but we cannot set their
> effective protection to 0 due to a limitation of the current
> implementation (see comment in mem_cgroup_protection()). Instead,
> we leave their effective protection values unchaged, and later ignore it
> in mem_cgroup_protection().
> 
> However, mem_cgroup_protection() is called later in
> shrink_lruvec()->get_scan_count(), which is after the
> mem_cgroup_below_{min/low}() checks in shrink_node_memcgs(). As a
> result, the stale effective protection values of the target memcg may
> lead us to skip reclaiming from the target memcg entirely, before
> calling shrink_lruvec(). This can be even worse with recursive
> protection, where the stale target memcg protection can be higher than
> its standalone protection.
> 
> An example where this can happen is as follows. Consider the following
> hierarchy with memory_recursiveprot:
> ROOT
>  |
>  A (memory.min = 50M)
>  |
>  B (memory.min = 10M, memory.high = 40M)
> 
> Consider the following scenarion:
> - B has memory.current = 35M.
> - The system undergoes global reclaim (target memcg is NULL).
> - B will have an effective min of 50M (all of A's unclaimed protection).
> - B will not be reclaimed from.
> - Now allocate 10M more memory in B, pushing it above it's high limit.
> - The system undergoes memcg reclaim from B (target memcg is B)
> - In shrink_node_memcgs(), we call mem_cgroup_calculate_protection(),
>   which immediately returns for B without doing anything, as B is the
>   target memcg, relying on mem_cgroup_protection() to ignore B's stale
>   effective min (still 50M).
> - Directly after mem_cgroup_calculate_protection(), we will call
>   mem_cgroup_below_min(), which will read the stale effective min for B
>   and skip it (instead of ignoring its protection as intended). In this
>   case, it's really bad because we are not just considering B's
>   standalone protection (10M), but we are reading a much higher stale
>   protection (50M) which will cause us to not reclaim from B at all.
> 
> This is an artifact of commit 45c7f7e1ef17 ("mm, memcg: decouple
> e{low,min} state mutations from protection checks") which made
> mem_cgroup_calculate_protection() only change the state without
> returning any value. Before that commit, we used to return
> MEMCG_PROT_NONE for the target memcg, which would cause us to skip the
> mem_cgroup_below_{min/low}() checks. After that commit we do not return
> anything and we end up checking the min & low effective protections for
> the target memcg, which are stale.
> 
> Add mem_cgroup_ignore_protection() that checks if we are reclaiming from
> the target memcg, and call it in mem_cgroup_below_{min/low}() to ignore
> the stale protection of the target memcg.
> 
> Fixes: 45c7f7e1ef17 ("mm, memcg: decouple e{low,min} state mutations from protection checks")
> Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>

Great catch!
The fix looks good to me, only a couple of cosmetic suggestions.

> ---
>  include/linux/memcontrol.h | 33 +++++++++++++++++++++++++++------
>  mm/vmscan.c                | 11 ++++++-----
>  2 files changed, 33 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index e1644a24009c..22c9c9f9c6b1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -625,18 +625,32 @@ static inline bool mem_cgroup_supports_protection(struct mem_cgroup *memcg)
>  
>  }
>  
> -static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
> +static inline bool mem_cgroup_ignore_protection(struct mem_cgroup *target,
> +						struct mem_cgroup *memcg)
>  {
> -	if (!mem_cgroup_supports_protection(memcg))

How about to merge mem_cgroup_supports_protection() and your new helper into
something like mem_cgroup_possibly_protected()? It seems like they never used
separately and unlikely ever will be used.
Also, I'd swap target and memcg arguments.

Thank you!


PS If it's not too hard, please, consider adding a new kselftest to cover this case.
Thank you!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux