Re: [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Thu, 3 Jan 2013 12:24:04 -0800

On Thu, 3 Jan 2013 19:09:01 +0100
Michal Hocko <mhocko@xxxxxxx> wrote:

> Hi,
> I have posted this quite some time ago
> (https://lkml.org/lkml/2012/12/14/102) but it probably slipped through
> ---
> >From 28b4e10bc3c18b82bee695b76f4bf25c03baa5f8 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxx>
> Date: Fri, 14 Dec 2012 11:12:43 +0100
> Subject: [PATCH] memcg,vmscan: do not break out targeted reclaim without
>  reclaimed pages
> 
> Targeted (hard resp. soft) reclaim has traditionally tried to scan one
> group with decreasing priority until nr_to_reclaim (SWAP_CLUSTER_MAX
> pages) is reclaimed or all priorities are exhausted. The reclaim is
> then retried until the limit is met.
> 
> This approach, however, doesn't work well with deeper hierarchies where
> groups higher in the hierarchy do not have any or only very few pages
> (this usually happens if those groups do not have any tasks and they
> have only re-parented pages after some of their children is removed).
> Those groups are reclaimed with decreasing priority pointlessly as there
> is nothing to reclaim from them.
> 
> An easiest fix is to break out of the memcg iteration loop in shrink_zone
> only if the whole hierarchy has been visited or sufficient pages have
> been reclaimed. This is also more natural because the reclaimer expects
> that the hierarchy under the given root is reclaimed. As a result we can
> simplify the soft limit reclaim which does its own iteration.
> 
> Reported-by: Ying Han <yinghan@xxxxxxxxxx>

But what was in that report?

My guess would be "excessive CPU consumption", and perhaps "excessive
reclaim in the higher-level memcgs".

IOW, what are the user-visible effects of this change?

(And congrats - you're the first person I've sent that sentence to this
year!  But not, I fear, the last)

I don't really understand what prevents limit reclaim from stealing
lots of pages from the top-level groups.  How do we ensure
balancing/fairness in this case?

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1973,18 +1973,17 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)

shrink_zone() might be getting a bit bloaty for CONFIG_MEMCG=n kernels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>