Re: [RFC PATCH] mm/mglru: keep the root_memcg reclaim behavior the same as memcg reclaim

"T.J. Mercier" <tjmercier@xxxxxxxxxx> · Mon, 16 Dec 2024 09:13:46 -0800

On Sun, Dec 15, 2024 at 5:54 PM hailong <hailong.liu@xxxxxxxx> wrote:
>
> On Fri, 13. Dec 09:06, T.J. Mercier wrote:
> > On Thu, Dec 12, 2024 at 6:26 PM hailong <hailong.liu@xxxxxxxx> wrote:
> > >
> > > On Thu, 12. Dec 10:22, T.J. Mercier wrote:
> > > > On Thu, Dec 12, 2024 at 1:57 AM hailong <hailong.liu@xxxxxxxx> wrote:
> > > > >
> > > > > From: Hailong Liu <hailong.liu@xxxxxxxx>
> > > > >
> > > > > commit a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard") said
> > > > > Note that memcg LRU only applies to global reclaim. For memcg reclaim,
> > > > > the eviction will continue, even if it is overshooting. This becomes
> > > > > unconditional due to code simplification.
> > > > >
> > > > > Howeven, if we reclaim a root memcg by sysfs (memory.reclaim), the behavior acts
> > > > > as a kswapd or direct reclaim.
> > > >
> > > > Hi Hailong,
> > > >
> > > > Why do you think this is a problem?
> > > >
> > > > > Fix this by remove the condition of mem_cgroup_is_root in
> > > > > root_reclaim().
> > > > > Signed-off-by: Hailong Liu <hailong.liu@xxxxxxxx>
> > > > > ---
> > > > >  mm/vmscan.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 76378bc257e3..1f74f3ba0999 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -216,7 +216,7 @@ static bool cgroup_reclaim(struct scan_control *sc)
> > > > >   */
> > > > >  static bool root_reclaim(struct scan_control *sc)
> > > > >  {
> > > > > -       return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
> > > > > +       return !sc->target_mem_cgroup;
> > > > >  }
> > > > >
> > > > >  /**
> > > > > --
> > > > > Actually we switch to mglru on kernel-6.1 and see different behavior on
> > > > > root_mem_cgroup reclaim. so is there any background fot this?
> > > >
> > > > Reclaim behavior differs with MGLRU.
> > > > https://lore.kernel.org/lkml/20221201223923.873696-1-yuzhao@xxxxxxxxxx/
> > > >
> > > > On even more recent kernels, regular LRU reclaim has also changed.
> > > > https://lore.kernel.org/lkml/20240514202641.2821494-1-hannes@xxxxxxxxxxx/
> > >
> > > Thanks for the details.
> > >
> > > Take this as a example.
> > >                root
> > >              /  |   \
> > >         /       |    \
> > >            a    b     c
> > >                     | \
> > >                     |  \
> > >                     d   e
> > > IIUC, the mglru can resolve the direct reclaim latency due to the
> > > sharding. However, for the proactive reclaim, if we want to reclaim
> > > b, b->d->e, however, if reclaiming the root, the reclaim path is
> > > uncertain. The call stack is as follows:
> > > lru_gen_shrink_node()->shrink_many()->hlist_nulls_for_each_entry_rcu()->shrink_one()
> > >
> > > So, for the proactive reclaim of root_memcg, whether it is mglru or
> > > regular lru, calling shrink_node_memcgs() makes the behavior certain
> > > and reasonable for me.
> >
> > The ordering is uncertain, but ordering has never been specified as
> > part of that interface AFAIK, and you'll still get what you ask for (X
> > bytes from the root or under). Assuming partial reclaim of a cgroup
> > (which I hope is true if you're reclaiming from the root?) if I have
> > the choice I'd rather have the memcg LRU ordering to try to reclaim
> > from colder memcgs first, rather than a static pre-order traversal
> > that always hits the same children first.
> >
> > The reason it's a choice only for the root is because the memcg LRU is
> > maintained at the pgdat level, not at each individual cgroup. So there
> > is no mechanism to get memcg LRU ordering from a subset of cgroups,
> > which would be pretty cool but that sounds expensive.
>
> Got it, thanks for clarifying. From the perspective of memcg, it
> behaves differently. But if we change the perspective to the global
> reclaim, it is reasonable because root memcg is another way of global
> reclaim. It makes global reclaim consistent. NACK myself :)

Yeah, that's another way to look at it. :)

> >
> > - T.J.
> >
> > > Help you, Help me,
> > > Hailong.
> --
> Help you, Help me,
> Hailong.