Re: [PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback

Nhat Pham <nphamcs@xxxxxxxxx> · Sat, 18 Nov 2023 13:51:37 -0500

On Fri, Nov 17, 2023 at 11:27 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:

>

> On Fri, Nov 17, 2023 at 8:23 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote:

> >

> > On Thu, Nov 16, 2023 at 4:57 PM Chris Li <chrisl@xxxxxxxxxx> wrote:

> > >

> > > Hi Nhat,

> > >

> > > I want want to share the high level feedback we discussed here in the

> > > mailing list as well.

> > >

> > > It is my observation that each memcg LRU list can't compare the page

> > > time order with other memcg.

> > > It works great when the leaf level memcg hits the memory limit and you

> > > want to reclaim from that memcg.

> > > It works less well on the global memory pressure you need to reclaim

> > > from all memcg. You kind of have to

> > > scan each all child memcg to find out the best page to shrink from. It

> > > is less effective to get to the most desirable page quickly.

> > >

> > > This can benefit from a design similar to MGLRU. This idea is

> > > suggested by Yu Zhao, credit goes to him not me.

> > > In other words, the current patch is similar to the memcg page list

> > > pre MGLRU world. We can have a MRLRU

> > > like per memcg zswap shrink list.

> >

> > I was gonna summarize the points myself :P But thanks for doing this.

> > It's your idea so you're more qualified to explain this anyway ;)

> >

> > I absolutely agree that having a generation-aware cgroup-aware

> > NUMA-aware LRU is the future way to go. Currently, IIUC, the reclaim logic

> > selects cgroups in a round-robin-ish manner. It's "fair" in this perspective,

> > but I also think it's not ideal. As we have discussed, the current list_lru

> > infrastructure only take into account intra-cgroup relative recency, not

> > inter-cgroup relative recency. The recently proposed time-based zswap

> > reclaim mechanism will provide us with a source of information, but the

> > overhead of using this might be too high - and it's very zswap-specific.

> >

> > Maybe after this, we should improve zswap reclaim (and perhaps all

> > list_lru users) by adding generations to list_lru then take generations

> > into account in the vmscan code. This patch series could be merged

> > as-is, and once we make list_lru generation-aware, zswap shrinker

> > will automagically be improved (along with all other list_lru/shrinker

> > users).

> >

> > I don't know enough about the current design of MGLRU to comment

> > too much further, but let me know if this makes sense, and if you have

> > objections/other ideas.

> >

> > And if you have other documentations for MGLRU than its code, could

> > you please let me know? I'm struggling to find more details about this.

> >

>

> This could be a good place to start:

> https://www.youtube.com/watch?v=9HvJfN21H9Y

Ah I think I've seen this talk before.

I'd also like to point out that the current set of heuristics employed by the
shrinker somewhat mimics an active-inactive LRUs (i.e a two generations
MGLRU). Not sure how to generalize this to more than two generations
though.