On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote: > > > On 2024/8/8 16:51, David Hildenbrand wrote: > > On 08.08.24 04:36, Baolin Wang wrote: > > > > > > > > > On 2024/8/7 23:53, David Hildenbrand wrote: > > > > On 07.08.24 09:31, Baolin Wang wrote: > > > > > Page reclaim will not scan anon LRU if no swap space, however > > > > > MADV_PAGEOUT > > > > > can still split shmem large folios even without a swap device. Thus add > > > > > swap available space validation before spliting shmem large folio to > > > > > avoid redundant split. > > > > > > > > > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > > > > > --- > > > > > mm/vmscan.c | 8 ++++++++ > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > index 31d13462571e..796f65781f4f 100644 > > > > > --- a/mm/vmscan.c > > > > > +++ b/mm/vmscan.c > > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct > > > > > list_head *folio_list, > > > > > } > > > > > } else if (folio_test_swapbacked(folio) && > > > > > folio_test_large(folio)) { > > > > > + > > > > > + /* > > > > > + * Do not split shmem folio if no swap memory > > > > > + * available. > > > > > + */ > > > > > + if (!total_swap_pages) > > > > > + goto activate_locked; > > > > > + > > > > > /* Split shmem folio */ > > > > > if (split_folio_to_list(folio, folio_list)) > > > > > goto keep_locked; > > > > > > > > Reminds me of > > > > > > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422 > > > > Author: Luis Chamberlain <mcgrof@xxxxxxxxxx> > > > > Date: Thu Mar 9 15:05:43 2023 -0800 > > > > > > > > shmem: skip page split if we're not reclaiming > > > > In theory when info->flags & VM_LOCKED we should not be getting > > > > shem_writepage() called so we should be verifying this with a > > > > WARN_ON_ONCE(). Since we should not be swapping then best to > > > > ensure we > > > > also don't do the folio split earlier too. So just move the check > > > > early > > > > to avoid folio splits in case its a dubious call. > > > > We also have a similar early bail when !total_swap_pages so just > > > > move that > > > > earlier to avoid the possible folio split in the same situation. > > > > > > > > > > > > But indeed, pageout() -> writepage() is called *after* the split in the > > > > vmscan path. > > > > > > > > In that "noswap" context, I wonder if we also want to skip folios part > > > > of shmem > > > > with disabled swapping? > > > > > > Yes, I think so. > > > > > > > > > > > But now I am wondering under which circumstances we end up calling > > > > shmem_writepage() with a large folio. And I think the answer is the > > > > comment of > > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c. > > > > > > > > > > > > ... so if shmem_writepage() handles+checks that, could we do > > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > index a332cb80e928..7dfa3d6e8ba7 100644 > > > > --- a/mm/vmscan.c > > > > +++ b/mm/vmscan.c > > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct > > > > list_head *folio_list, > > > > goto > > > > activate_locked_split; > > > > } > > > > } > > > > - } else if (folio_test_swapbacked(folio) && > > > > - folio_test_large(folio)) { > > > > - /* Split shmem folio */ > > > > - if (split_folio_to_list(folio, folio_list)) > > > > - goto keep_locked; > > > > } > > > > > > > > /* > > > > > > > > instead? > > > > > > Seems reasonable to me. But we should pass the 'folio_list' to > > > shmem_writepage() to list the subpages of the large folio. Let me try. > > > > Ah, yes, good point. Alternatively, we'd have to split and try writing > > all subpages in there. I wonder what to do if we fail to write some, and > > if we could handle that transparently, without the folio_list. > > After some investigation, I prefer to pass 'folio_list' to shmem_writepage() > via 'struct writeback_control', which could simplify the logic a lot. > Otherwise, we need to handle each subpage's writeback/reclaim/dirty state, > as well as tracking each subpage's write result, which seems more > complicated. > > I made a quick change by passing 'folio_list', and it looks simple and works > as expected. > > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > index 75196b0f894f..10100e22d5c6 100644 > --- a/include/linux/writeback.h > +++ b/include/linux/writeback.h > @@ -80,6 +80,9 @@ struct writeback_control { > */ > struct swap_iocb **swap_plug; > > + /* Target list for splitting a large folio */ > + struct list_head *list; > + > /* internal fields used by the ->writepages implementation: */ > struct folio_batch fbatch; > pgoff_t index; > diff --git a/mm/shmem.c b/mm/shmem.c > index 05525e9e7423..0a5a68f7d0a0 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct > writeback_control *wbc) > * and its shmem_writeback() needs them to be split when swapping. > */ > if (wbc->split_large_folio && folio_test_large(folio)) { > +try_split: > /* Ensure the subpages are still dirty */ > folio_test_set_dirty(folio); > - if (split_huge_page(page) < 0) > + if (split_huge_page_to_list_to_order(page, wbc->list, 0)) We check for split_large_folio, but we still send the wbc->list for i915 writepage() case. Previously, we were sending a NULL list. Shouldn't we address that case too? > goto redirty; > folio = page_folio(page); > folio_clear_dirty(folio); > @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct > writeback_control *wbc) > } > > swap = folio_alloc_swap(folio); > - if (!swap.val) > + if (!swap.val) { > + if (nr_pages > 1) > + goto try_split; > + > goto redirty; > + } > > /* > * Add inode to shmem_unuse()'s list of swapped-out inodes, > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 277571815789..cf982cf2454f 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -628,7 +628,7 @@ typedef enum { > * Calls ->writepage(). > */ > static pageout_t pageout(struct folio *folio, struct address_space > *mapping, > - struct swap_iocb **plug) > + struct swap_iocb **plug, struct list_head > *folio_list) > { > /* > * If the folio is dirty, only perform writeback if that write > @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct > address_space *mapping, > .swap_plug = plug, > }; > > + if (shmem_mapping(mapping)) { > + wbc.list = folio_list; > + wbc.split_large_folio = > !IS_ENABLED(CONFIG_THP_SWAP); > + } > + > folio_set_reclaim(folio); > res = mapping->a_ops->writepage(&folio->page, &wbc); > if (res < 0) > @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct > list_head *folio_list, > goto activate_locked_split; > } > } > - } else if (folio_test_swapbacked(folio) && > - folio_test_large(folio)) { > - > - /* > - * Do not split shmem folio if no swap memory > - * available. > - */ > - if (!total_swap_pages) > - goto activate_locked; > - > - /* > - * Only split shmem folio when CONFIG_THP_SWAP > - * is not enabled. > - */ > - if (!IS_ENABLED(CONFIG_THP_SWAP) && > - split_folio_to_list(folio, folio_list)) > - goto keep_locked; > } > > /* > @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct > list_head *folio_list, > * starts and then write it out here. > */ > try_to_unmap_flush_dirty(); > -try_pageout: > - switch (pageout(folio, mapping, &plug)) { > + switch (pageout(folio, mapping, &plug, folio_list)) > { > case PAGE_KEEP: > goto keep_locked; > case PAGE_ACTIVATE: > - if (shmem_mapping(mapping) && > folio_test_large(folio) && > - !split_folio_to_list(folio, folio_list)) > { > + /* Shmem can be split when writeback to swap > */ > + if ((nr_pages > 1) && > !folio_test_large(folio)) { > + sc->nr_scanned -= (nr_pages - 1); > nr_pages = 1; > - goto try_pageout; > } > goto activate_locked; > case PAGE_SUCCESS: > + if ((nr_pages > 1) && > !folio_test_large(folio)) { > + sc->nr_scanned -= (nr_pages - 1); > + nr_pages = 1; > + } > stat->nr_pageout += nr_pages; > > if (folio_test_writeback(folio)) {