Re: [PATCH 1/2] mm: multi-gen LRU: retry folios written back while isolated

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Wed, 16 Nov 2022 14:59:52 -0800

On Tue, 15 Nov 2022 18:38:07 -0700 Yu Zhao <yuzhao@xxxxxxxxxx> wrote:

> The page reclaim isolates a batch of folios from the tail of one of
> the LRU lists and works on those folios one by one. For a suitable
> swap-backed folio, if the swap device is async, it queues that folio
> for writeback. After the page reclaim finishes an entire batch, it
> puts back the folios it queued for writeback to the head of the
> original LRU list.
> 
> In the meantime, the page writeback flushes the queued folios also by
> batches. Its batching logic is independent from that of the page
> reclaim. For each of the folios it writes back, the page writeback
> calls folio_rotate_reclaimable() which tries to rotate a folio to the
> tail.
> 
> folio_rotate_reclaimable() only works for a folio after the page
> reclaim has put it back. If an async swap device is fast enough, the
> page writeback can finish with that folio while the page reclaim is
> still working on the rest of the batch containing it. In this case,
> that folio will remain at the head and the page reclaim will not retry
> it before reaching there.
> 
> This patch adds a retry to evict_folios(). After evict_folios() has
> finished an entire batch and before it puts back folios it cannot free
> immediately, it retries those that may have missed the rotation.
> 
> Before this patch, ~60% of folios swapped to an Intel Optane missed
> folio_rotate_reclaimable(). After this patch, ~99% of missed folios
> were reclaimed upon retry.
> 
> This problem affects relatively slow async swap devices like Samsung
> 980 Pro much less and does not affect sync swap devices like zram or
> zswap at all.

As I understand it, this approach has an implicit assumption that by
the time evict_folios() has completed its first pass, write IOs will
have completed and the resulting folios are available for processing on
evict_folios()'s second pass, yes?

If so, it all kinda works by luck of timing.  If the swap device is
even slower, the number of folios which are unavailable on the second
pass will increase?

Can we make this more deterministic?  For example change evict_folios()
to recognize this situation and to then do folio_rotate_reclaimable()'s
work for it?  Or if that isn't practical, do something else?

(Is folio_rotate_reclaimable() actually useful?  That concept must be
20 years old.  What breaks if we just delete it and leave the pages
wherever they are?)