Re: [RFC v1] memcg: add memcg lru for page reclaiming

Hillf Danton <hdanton@xxxxxxxx> · Wed, 23 Oct 2019 12:44:48 +0800

On Tue, 22 Oct 2019 15:58:32 +0200 Michal Hocko wrote:
> 
> On Tue 22-10-19 21:30:50, Hillf Danton wrote:
> > 
> > On Mon, 21 Oct 2019 14:14:53 +0200 Michal Hocko wrote:
> > > 
> > > On Mon 21-10-19 19:56:54, Hillf Danton wrote:
> > > > 
> > > > Currently soft limit reclaim is frozen, see
> > > > Documentation/admin-guide/cgroup-v2.rst for reasons.
> > > > 
> > > > Copying the page lru idea, memcg lru is added for selecting victim
> > > > memcg to reclaim pages from under memory pressure. It now works in
> > > > parallel to slr not only because the latter needs some time to reap
> > > > but the coexistence facilitates it a lot to add the lru in a straight
> > > > forward manner.
> > > 
> > > This doesn't explain what is the problem/feature you would like to
> > > fix/achieve. It also doesn't explain the overall design. 
> > 
> > 1, memcg lru makes page reclaiming hierarchy aware
> 
> Is that a problem statement or a design goal?

A problem in soft limit reclaim as per cgroup-v2.rst that is addressed
in the RFC.

> > While doing the high work, memcgs are currently reclaimed one after
> > another up through the hierarchy;
> 
> Which is the design because it is the the memcg where the high limit got
> hit. The hierarchical behavior ensures that the subtree of that memcg is
> reclaimed and we try to spread the reclaim fairly over the tree.

Yeah, that coding is scarcely able to escape standing ovation. No one of
its merits yet is missed in the RFC except for breaking spiraling up the
memcg hierarchy into two parts, the up half that rips pages off the first
victim, and the bottom half that queues the victim's first ancestor on the
lru(the ice box storing the cakes baked for kswapd), see below for reasons.

> > in this RFC after ripping pages off
> > the first victim, the work finishes with the first ancestor of the victim
> > added to lru.
> > 
> > Recaliming is defered until kswapd becomes active.
> 
> This is a wrong assumption because high limit might be configured way
> before kswapd is woken up.

This change was introduced because high limit breach looks not like a
serious problem in the absence of memory pressure. Lets do the hard work,
reclaiming one memcg a time up through the hierarchy, when kswapd becomes
active. It also explains the BH introduced.

> > 2, memcg lru tries much to avoid overreclaim
> 
> Again, is this a problem statement or a design goal?

Another problem in SLR as per cgroup-v2.rst that is addressed in the RFC.

> > Only one memcg is picked off lru in FIFO mode under memory pressure,
> > and MEMCG_CHARGE_BATCH pages are reclaimed one memcg at a time.
> 
> And why is this preferred over SWAP_CLUSTER_MAX

No change is added in the current high work behavior in terms of
MEMCG_CHARGE_BATCH; try_to_free_mem_cgroup_pages() takes care of both.

> and whole subtree reclaim that we do currently? 

We terminate climbing up the hierarchy once kswapd finger snaps "Cut. Work done."

Thanks
Hillf