Re: [PATCH 1/2] mm: multi-gen LRU: retry folios written back while isolated

Minchan Kim <minchan@xxxxxxxxxx> · Fri, 18 Nov 2022 14:33:39 -0800

On Fri, Nov 18, 2022 at 02:51:01PM -0700, Yu Zhao wrote:
> On Fri, Nov 18, 2022 at 2:25 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> >
> > On Thu, Nov 17, 2022 at 06:40:05PM -0700, Yu Zhao wrote:
> > > On Thu, Nov 17, 2022 at 6:26 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Nov 17, 2022 at 03:22:42PM -0700, Yu Zhao wrote:
> > > > > On Thu, Nov 17, 2022 at 12:47 AM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Nov 15, 2022 at 06:38:07PM -0700, Yu Zhao wrote:
> > > > > > > The page reclaim isolates a batch of folios from the tail of one of
> > > > > > > the LRU lists and works on those folios one by one. For a suitable
> > > > > > > swap-backed folio, if the swap device is async, it queues that folio
> > > > > > > for writeback. After the page reclaim finishes an entire batch, it
> > > > > > > puts back the folios it queued for writeback to the head of the
> > > > > > > original LRU list.
> > > > > > >
> > > > > > > In the meantime, the page writeback flushes the queued folios also by
> > > > > > > batches. Its batching logic is independent from that of the page
> > > > > > > reclaim. For each of the folios it writes back, the page writeback
> > > > > > > calls folio_rotate_reclaimable() which tries to rotate a folio to the
> > > > > > > tail.
> > > > > > >
> > > > > > > folio_rotate_reclaimable() only works for a folio after the page
> > > > > > > reclaim has put it back. If an async swap device is fast enough, the
> > > > > > > page writeback can finish with that folio while the page reclaim is
> > > > > > > still working on the rest of the batch containing it. In this case,
> > > > > > > that folio will remain at the head and the page reclaim will not retry
> > > > > > > it before reaching there.
> > > > > > >
> > > > > > > This patch adds a retry to evict_folios(). After evict_folios() has
> > > > > > > finished an entire batch and before it puts back folios it cannot free
> > > > > > > immediately, it retries those that may have missed the rotation.
> > > > > >
> > > > > > Can we make something like this?
> > > > >
> > > > > This works for both the active/inactive LRU and MGLRU.
> > > >
> > > > I hope we fix both altogether.
> > > >
> > > > >
> > > > > But it's not my prefered way because of these two subtle differences:
> > > > > 1. Folios eligible for retry take an unnecessary round trip below --
> > > > > they are first added to the LRU list and then removed from there for
> > > > > retry. For high speed swap devices, the LRU lock contention is already
> > > > > quite high (>10% in CPU profile under heavy memory pressure). So I'm
> > > > > hoping we can avoid this round trip.
> > > > > 2. The number of retries of a folio on folio_wb_list is unlimited,
> > > > > whereas this patch limits the retry to one. So in theory, we can spin
> > > > > on a bunch of folios that keep failing.
> > > > >
> > > > > The most ideal solution would be to have the one-off retry logic in
> > > > > shrink_folio_list(). But right now, that function is very cluttered. I
> > > > > plan to refactor it (low priority at the moment), and probably after
> > > > > that, we can add a generic retry for both the active/inactive LRU and
> > > > > MGLRU. I'll raise its priority if you strongly prefer this. Please
> > > > > feel free to let me know.
> > > >
> > > > Well, my preference for *ideal solution* is writeback completion drops
> > > > page immediately without LRU rotating. IIRC, concern was softirq latency
> > > > and locking relevant in the context at that time when I tried it.
> > >
> > > Are we good for now or are there other ideas we want to try while we are at it?
> > >
> >
> > good for now with what solution you are thinking? The retry logic you
> > suggested? I personally don't like the solution relies on the timing.
> >
> > If you are concerning about unnecessary round trip, it shouldn't
> > happen frequency since your assumption is swap device is so fast
> > so second loop would see their wb done?
> 
> No, the round trip that hits the LRU lock in the process.

I see what you meant.

> 
> For folios written and ready to be freed, they'll have to go from
> being isolated to the tail of LRU list and then to getting isolated
> again. This requires an extra hit on the LRU lock, which is highly
> contended for fast swap devices under heavy memory pressure.
> 
> > Anyway, I am strongly push my preference. Feel free to go with way

Oh, sorry for the typo: "not strongly push my preference"

> > you want if the solution can fix both LRU schemes.
> 
> There is another concern I listed previously:
> 
> > > > > 2. The number of retries of a folio on folio_wb_list is unlimited,
> > > > > whereas this patch limits the retry to one. So in theory, we can spin
> > > > > on a bunch of folios that keep failing.
> 
> If this can happen, it'd be really hard to track it down. Any thoughts on this?

Could you elaborate why folio_wb_list can keep spinning?

My concern is how we can make sure the timing bet is good for most
workloads on heterogeneous/dvfs frequency core control env.

> 
> I share your desire to fix both. But I don't think we can just dismiss
> the two points I listed above. They are reasonable, aren't they?
>