On Mon, Nov 25, 2024 at 2:19 PM chenridong <chenridong@xxxxxxxxxx> wrote: > > > > On 2024/11/18 12:21, Matthew Wilcox wrote: > > On Mon, Nov 18, 2024 at 05:14:14PM +1300, Barry Song wrote: > >> On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > >>> > >>> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote: > >>>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited > >>>> and added to swap cache folio by folio. After adding to swap cache, > >>>> it will submit io to writeback folio to swap, which is asynchronous. > >>>> When shrink_page_list is finished, the isolated folios list will be > >>>> moved back to the head of inactive lru. The inactive lru may just look > >>>> like this, with 512 filioes have been move to the head of inactive lru. > >>> > >>> I was hoping that we'd be able to stop splitting the folio when adding > >>> to the swap cache. Ideally. we'd add the whole 2MB and write it back > >>> as a single unit. > >> > >> This is already the case: adding to the swapcache doesn’t require splitting > >> THPs, but failing to allocate 2MB of contiguous swap slots will. > > > > Agreed we need to understand why this is happening. As I've said a few > > times now, we need to stop requiring contiguity. Real filesystems don't > > need the contiguity (they become less efficient, but they can scatter a > > single 2MB folio to multiple places). > > > > Maybe Chris has a solution to this in the works? > > > > Hi, Chris, do you have a better idea to solve this issue? Not Chris. As I read the code again, we have already the below code to fixup the issue "missed folio_rotate_reclaimable()" in evict_folios(): /* retry folios that may have missed folio_rotate_reclaimable() */ list_move(&folio->lru, &clean); It doesn't work for you? commit 359a5e1416caaf9ce28396a65ed3e386cc5de663 Author: Yu Zhao <yuzhao@xxxxxxxxxx> Date: Tue Nov 15 18:38:07 2022 -0700 mm: multi-gen LRU: retry folios written back while isolated The page reclaim isolates a batch of folios from the tail of one of the LRU lists and works on those folios one by one. For a suitable swap-backed folio, if the swap device is async, it queues that folio for writeback. After the page reclaim finishes an entire batch, it puts back the folios it queued for writeback to the head of the original LRU list. In the meantime, the page writeback flushes the queued folios also by batches. Its batching logic is independent from that of the page reclaim. For each of the folios it writes back, the page writeback calls folio_rotate_reclaimable() which tries to rotate a folio to the tail. folio_rotate_reclaimable() only works for a folio after the page reclaim has put it back. If an async swap device is fast enough, the page writeback can finish with that folio while the page reclaim is still working on the rest of the batch containing it. In this case, that folio will remain at the head and the page reclaim will not retry it before reaching there. This patch adds a retry to evict_folios(). After evict_folios() has finished an entire batch and before it puts back folios it cannot free immediately, it retries those that may have missed the rotation. Before this patch, ~60% of folios swapped to an Intel Optane missed folio_rotate_reclaimable(). After this patch, ~99% of missed folios were reclaimed upon retry. This problem affects relatively slow async swap devices like Samsung 980 Pro much less and does not affect sync swap devices like zram or zswap at all. > > Best regards, > Ridong Thanks Barry