Re: [PATCH] mm: skip current when memcg reclaim

Michal Hocko <mhocko@xxxxxxxx> · Mon, 18 Oct 2021 14:41:22 +0200

On Mon 18-10-21 17:25:23, Zhaoyang Huang wrote:
> On Mon, Oct 18, 2021 at 4:23 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Fri 15-10-21 14:15:29, Huangzhaoyang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > >
> > > Sibling thread of the same process could refault the reclaimed pages
> > > in the same time, which would be typical in None global reclaim and
> > > introduce thrashing.
> >
> > It is hard to understand what kind of problem you see (ideally along
> > with some numbers) and how the proposed patch addresses that problem
> >
> > Also you are missing Signed-off-by tag (please have a look at
> > Documentation/process/submitting-patches.rst which is much more
> > comprehensive about the process).
> sorry for that, I will fix it.
> >
> > > ---
> > >  mm/vmscan.c | 5 +++++
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 5199b96..ebbdc37 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2841,6 +2841,11 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
> > >                               sc->memcg_low_skipped = 1;
> > >                               continue;
> > >                       }
> > > +                     /*
> > > +                      * Don't bother current when its memcg is below low
> > > +                      */
> > > +                     if (get_mem_cgroup_from_mm(current->mm) == memcg)
> > > +                             continue;
> >
> > This code is executed when none of memcg in the reclaimed hierarchy
> > could be reclaimed. Low limit is then ignored and this change is
> > tweaking that behavior without any description of the effect. A very
> > vague note about trashing would indicate that you have something like
> > the following
> >
> >         A (hiting hard limit)
> >        / \
> >       B   C
> >
> > Both B and C low limit protected and current task associated with B. As
> > none of the two could be reclaimed due to soft protection yuu prefer to
> > reclaim from C as you do not want to reclaim from the current process as
> > that could reclaim current's working set. Correct?
> >
> > I would be really curious about more specifics of the used hierarchy.
> What I am facing is a typical scenario on Android, that is a big
> memory consuming APP(camera etc) launched while background filled by
> other processes. The hierarchy is like what you describe above where B
> represents the APP and memory.low is set to help warm restart. Both of
> kswapd and direct reclaim work together to reclaim pages under this
> scenario, which can cause 20MB file page delete from LRU in several
> second. This change could help to have current process's page escape
> from being reclaimed and cause page thrashing. We observed the result
> via systrace which shows that the Uninterruptible sleep(block on page
> bit) and iowait get smaller than usual.

I still have hard time to understand the exact setup and why the patch
helps you. If you want to protect B more than the low limit would allow
for by stealiong from C then the same thing can happen from anybody
reclaiming from C so in the end there is no protection. The same would
apply for any global direct memory reclaim done by a 3rd party. So I
suspect that your patch just happens to work by a luck.

Why both B and C have low limit setup and they both cannot be reclaimed?
Isn't that a weird setup where A hard limit is too close to sum of low
limits of B and C?

In other words could you share a more detailed configuration you are
using and some more details why both B and C have been skipped during
the first pass of the reclaim?

-- 
Michal Hocko
SUSE Labs