Re: [PATCH RFC] mm: mglru: provide a separate list for lazyfree anon folios

Minchan Kim <minchan@xxxxxxxxxx> · Tue, 24 Sep 2024 13:12:43 -0700

On Tue, Sep 24, 2024 at 10:38:37AM +1200, Barry Song wrote:
> On Tue, Sep 24, 2024 at 10:19 AM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> >
> > On Fri, Sep 20, 2024 at 01:23:57PM +1200, Barry Song wrote:
> > > On Wed, Sep 18, 2024 at 12:02 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> > > >
> > > > On 14.09.24 08:37, Barry Song wrote:
> > > > > From: Barry Song <v-songbaohua@xxxxxxxx>
> > > > >
> > > > > This follows up on the discussion regarding Gaoxu's work[1]. It's
> > > > > unclear if there's still interest in implementing a separate LRU
> > > > > list for lazyfree folios, but I decided to explore it out of
> > > > > curiosity.
> > > > >
> > > > > According to Lokesh, MADV_FREE'd anon folios are expected to be
> > > > > released earlier than file folios. One option, as implemented
> > > > > by Gao Xu, is to place lazyfree anon folios at the tail of the
> > > > > file's `min_seq` generation. However, this approach results in
> > > > > lazyfree folios being released in a LIFO manner, which conflicts
> > > > > with LRU behavior, as noted by Michal.
> > > > >
> > > > > To address this, this patch proposes maintaining a separate list
> > > > > for lazyfree anon folios while keeping them classified under the
> > > > > "file" LRU type to minimize code changes. These lazyfree anon
> > > > > folios will still be counted as file folios and share the same
> > > > > generation with regular files. In the eviction path, the lazyfree
> > > > > list will be prioritized for scanning before the actual file
> > > > > LRU list.
> > > > >
> > > >
> > > > What's the downside of another LRU list? Do we have any experience on that?
> > >
> > > Essentially, the goal is to address the downsides of using a single LRU list for
> > > files and lazyfree anonymous pages - seriously more files re-faults.
> > >
> > > I'm not entirely clear on the downsides of having an additional LRU
> > > list. While it
> > > does increase complexity, it doesn't seem to be significant.
> >
> > It's not a catastrophic[1]. I prefer the idea of an additional LRU
> > because it offers flexibility for various potential use cases[2].
> >
> > orthgonal topic(but may be interest for someone)
> >
> > My main interest in a new LRU list is to enable the system to maintain a
> > quickly reclaimable memory pool and expose the size to the admin with
> > a knob to decide how many memory pool they want.
> >
> > This pool would consist of clean, unmapped pages from both the page cache
> > and/or the swap cache. This would allow the system to reclaim memory quickly
> > when free memory is low, at the cost of minor fault overhead.
> 
> My current implementation only handles the MADV_FREE anonymous case. If they
> are placed in a single LRU, they should be able to be reclaimed very
> quickly, simply
> discarded without needing to be swapped out.
> 
> I've been thinking about the issue of unmapped pagecache recently.
> These unmapped
> pagecaches can be reclaimed much faster than mapped ones, especially
> when the latter
> have a high mapcount and incur significant rmap costs. However, many
> pagecaches are
> inherently unmapped (e.g., from syscall read). If they are placed in a
> single LRU, the
> challenge would be comparing the age of unmapped pagecache with mapped ones.
> Currently, with the mglru tier mechanism, frequently accessed unmapped
> pagecaches
> have a chance to be placed in a spot where they are harder to reclaim.
> 
> personally I am quite interested in putting unmapped pagecache
> together as right now
> reclamation could be like this:
> 
> lru list:
> unmapped pagecache(A) - mapped pagecached(B) - unmapped pagecache(C) - mapped
> pagecached with huge mapcount(D)
> 
> A and C can be reclaimed with zero cost but they have to wait for D and B.
> 
> But the question is that if make two lists:
> 
> list1: A - C
> list2: B - D
> 
> How can we ensure that A and C won't experience many refaults, even though
> reclaiming them would be cost-free? Or that B and D might actually be
> colder than
> A and C?
> 
> If this isn't an issue, I'd be very interested in implementing it. Any thoughts?

My proposal involves the following:

1. Introduce an "easily reclaimable" LRU list. This list would hold pages
   that can be quickly freed without significant overhead.

2. Implement a parameter to control the size of this list. This allows for
   system tuning based on available memory and performance requirements.

3. Modify kswapd behavior to utilize this list. When kswapd is awakened due
   to memory pressure, it should attempt to drop those pages first to refill
   free pages up to the high watermark by first reclaiming.

4. Before kswapd goes to sleep, it should scan the tail of the LRU list and
   move cold pages to the easily reclaimable list, unmapping them from the
   page table.

5. Whenever page cache hit, move the page into evictable LRU.

This approach allows the system to maintain a pool of readily available
memory, mitigating the "aging" problem. The trade-off is the potential for
minor page faults and LRU movement ovehreads if these pages in ez_reclaimable
LRU are accessed again.

Furthermore, we could put some asynchrnous writeback pages(e.g., swap
out or writeback the fs pages) into the list, too.
Currently, what we are doing is rotate those pages back to head of LRU
and once writeback is done, move the page to the tail of LRU again.
We can simply put the page into ez_reclaimable LRU without rotating
back and forth.