On Thu, Nov 17, 2022 at 12:47 AM Minchan Kim <minchan@xxxxxxxxxx> wrote: > > On Tue, Nov 15, 2022 at 06:38:07PM -0700, Yu Zhao wrote: > > The page reclaim isolates a batch of folios from the tail of one of > > the LRU lists and works on those folios one by one. For a suitable > > swap-backed folio, if the swap device is async, it queues that folio > > for writeback. After the page reclaim finishes an entire batch, it > > puts back the folios it queued for writeback to the head of the > > original LRU list. > > > > In the meantime, the page writeback flushes the queued folios also by > > batches. Its batching logic is independent from that of the page > > reclaim. For each of the folios it writes back, the page writeback > > calls folio_rotate_reclaimable() which tries to rotate a folio to the > > tail. > > > > folio_rotate_reclaimable() only works for a folio after the page > > reclaim has put it back. If an async swap device is fast enough, the > > page writeback can finish with that folio while the page reclaim is > > still working on the rest of the batch containing it. In this case, > > that folio will remain at the head and the page reclaim will not retry > > it before reaching there. > > > > This patch adds a retry to evict_folios(). After evict_folios() has > > finished an entire batch and before it puts back folios it cannot free > > immediately, it retries those that may have missed the rotation. > > Can we make something like this? This works for both the active/inactive LRU and MGLRU. But it's not my prefered way because of these two subtle differences: 1. Folios eligible for retry take an unnecessary round trip below -- they are first added to the LRU list and then removed from there for retry. For high speed swap devices, the LRU lock contention is already quite high (>10% in CPU profile under heavy memory pressure). So I'm hoping we can avoid this round trip. 2. The number of retries of a folio on folio_wb_list is unlimited, whereas this patch limits the retry to one. So in theory, we can spin on a bunch of folios that keep failing. The most ideal solution would be to have the one-off retry logic in shrink_folio_list(). But right now, that function is very cluttered. I plan to refactor it (low priority at the moment), and probably after that, we can add a generic retry for both the active/inactive LRU and MGLRU. I'll raise its priority if you strongly prefer this. Please feel free to let me know. Thanks. > shrink_folio_list(struct list_head *folio_list, struct list_head *folio_wb_list, ) > pageout > goto keep > .. > .. > > keep: > if (folio_test_writeback(folio) && > folio_test_reclaim(folio)) > list_add(&folio->lru, &ret_writeback_folio); > > move_folios_to_lru(&folio_list, &folio_wb_list); > struct folio *wb_folio = lru_to_folio(folio_wb_list); > > /* > * If writeback is already done, move the page into tail. > * Otherwise, put the page into head and folio_rotate_reclaimable > * will move it to the tail when the writeback is done > */ > if (!folio_test_writeback(wb_folio)) && > folio_test_reclaim(wb_folio)) > lruvec_add_folio_tail(lruvec, folio);