On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote: > On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote: > > > > > > On 2024/8/8 16:51, David Hildenbrand wrote: > > > On 08.08.24 04:36, Baolin Wang wrote: > > > > > > > > > > > > On 2024/8/7 23:53, David Hildenbrand wrote: > > > > > On 07.08.24 09:31, Baolin Wang wrote: > > > > > > Page reclaim will not scan anon LRU if no swap space, however > > > > > > MADV_PAGEOUT > > > > > > can still split shmem large folios even without a swap device. Thus add > > > > > > swap available space validation before spliting shmem large folio to > > > > > > avoid redundant split. > > > > > > > > > > > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > > > > > > --- > > > > > > mm/vmscan.c | 8 ++++++++ > > > > > > 1 file changed, 8 insertions(+) > > > > > > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > > index 31d13462571e..796f65781f4f 100644 > > > > > > --- a/mm/vmscan.c > > > > > > +++ b/mm/vmscan.c > > > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct > > > > > > list_head *folio_list, > > > > > > } > > > > > > } else if (folio_test_swapbacked(folio) && > > > > > > folio_test_large(folio)) { > > > > > > + > > > > > > + /* > > > > > > + * Do not split shmem folio if no swap memory > > > > > > + * available. > > > > > > + */ > > > > > > + if (!total_swap_pages) > > > > > > + goto activate_locked; > > > > > > + > > > > > > /* Split shmem folio */ > > > > > > if (split_folio_to_list(folio, folio_list)) > > > > > > goto keep_locked; > > > > > > > > > > Reminds me of > > > > > > > > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422 > > > > > Author: Luis Chamberlain <mcgrof@xxxxxxxxxx> > > > > > Date: Thu Mar 9 15:05:43 2023 -0800 > > > > > > > > > > shmem: skip page split if we're not reclaiming > > > > > In theory when info->flags & VM_LOCKED we should not be getting > > > > > shem_writepage() called so we should be verifying this with a > > > > > WARN_ON_ONCE(). Since we should not be swapping then best to > > > > > ensure we > > > > > also don't do the folio split earlier too. So just move the check > > > > > early > > > > > to avoid folio splits in case its a dubious call. > > > > > We also have a similar early bail when !total_swap_pages so just > > > > > move that > > > > > earlier to avoid the possible folio split in the same situation. > > > > > > > > > > > > > > > But indeed, pageout() -> writepage() is called *after* the split in the > > > > > vmscan path. > > > > > > > > > > In that "noswap" context, I wonder if we also want to skip folios part > > > > > of shmem > > > > > with disabled swapping? > > > > > > > > Yes, I think so. > > > > > > > > > > > > > > But now I am wondering under which circumstances we end up calling > > > > > shmem_writepage() with a large folio. And I think the answer is the > > > > > comment of > > > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c. > > > > > > > > > > > > > > > ... so if shmem_writepage() handles+checks that, could we do > > > > > > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > > > index a332cb80e928..7dfa3d6e8ba7 100644 > > > > > --- a/mm/vmscan.c > > > > > +++ b/mm/vmscan.c > > > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct > > > > > list_head *folio_list, > > > > > goto > > > > > activate_locked_split; > > > > > } > > > > > } > > > > > - } else if (folio_test_swapbacked(folio) && > > > > > - folio_test_large(folio)) { > > > > > - /* Split shmem folio */ > > > > > - if (split_folio_to_list(folio, folio_list)) > > > > > - goto keep_locked; > > > > > } > > > > > > > > > > /* > > > > > > > > > > instead? > > > > > > > > Seems reasonable to me. But we should pass the 'folio_list' to > > > > shmem_writepage() to list the subpages of the large folio. Let me try. > > > > > > Ah, yes, good point. Alternatively, we'd have to split and try writing > > > all subpages in there. I wonder what to do if we fail to write some, and > > > if we could handle that transparently, without the folio_list. > > > > After some investigation, I prefer to pass 'folio_list' to shmem_writepage() > > via 'struct writeback_control', which could simplify the logic a lot. > > Otherwise, we need to handle each subpage's writeback/reclaim/dirty state, > > as well as tracking each subpage's write result, which seems more > > complicated. > > > > I made a quick change by passing 'folio_list', and it looks simple and works > > as expected. > > > > diff --git a/include/linux/writeback.h b/include/linux/writeback.h > > index 75196b0f894f..10100e22d5c6 100644 > > --- a/include/linux/writeback.h > > +++ b/include/linux/writeback.h > > @@ -80,6 +80,9 @@ struct writeback_control { > > */ > > struct swap_iocb **swap_plug; > > > > + /* Target list for splitting a large folio */ > > + struct list_head *list; > > + > > /* internal fields used by the ->writepages implementation: */ > > struct folio_batch fbatch; > > pgoff_t index; > > diff --git a/mm/shmem.c b/mm/shmem.c > > index 05525e9e7423..0a5a68f7d0a0 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct > > writeback_control *wbc) > > * and its shmem_writeback() needs them to be split when swapping. > > */ > > if (wbc->split_large_folio && folio_test_large(folio)) { > > +try_split: > > /* Ensure the subpages are still dirty */ > > folio_test_set_dirty(folio); > > - if (split_huge_page(page) < 0) > > + if (split_huge_page_to_list_to_order(page, wbc->list, 0)) > > We check for split_large_folio, but we still send the wbc->list for i915 > writepage() case. Previously, we were sending a NULL list. Shouldn't we address > that case too? I guess I was missing wbc initialization snippet: diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index fe69f2c8527d..174b95a9a988 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping) .range_start = 0, .range_end = LLONG_MAX, .for_reclaim = 1, + .list = NULL, }; unsigned long i; > > goto redirty; > > folio = page_folio(page); > > folio_clear_dirty(folio); > > @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct > > writeback_control *wbc) > > } > > > > swap = folio_alloc_swap(folio); > > - if (!swap.val) > > + if (!swap.val) { > > + if (nr_pages > 1) > > + goto try_split; > > + > > goto redirty; > > + } > > > > /* > > * Add inode to shmem_unuse()'s list of swapped-out inodes, > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 277571815789..cf982cf2454f 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -628,7 +628,7 @@ typedef enum { > > * Calls ->writepage(). > > */ > > static pageout_t pageout(struct folio *folio, struct address_space > > *mapping, > > - struct swap_iocb **plug) > > + struct swap_iocb **plug, struct list_head > > *folio_list) > > { > > /* > > * If the folio is dirty, only perform writeback if that write > > @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct > > address_space *mapping, > > .swap_plug = plug, > > }; > > > > + if (shmem_mapping(mapping)) { > > + wbc.list = folio_list; > > + wbc.split_large_folio = > > !IS_ENABLED(CONFIG_THP_SWAP); > > + } > > + > > folio_set_reclaim(folio); > > res = mapping->a_ops->writepage(&folio->page, &wbc); > > if (res < 0) > > @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct > > list_head *folio_list, > > goto activate_locked_split; > > } > > } > > - } else if (folio_test_swapbacked(folio) && > > - folio_test_large(folio)) { > > - > > - /* > > - * Do not split shmem folio if no swap memory > > - * available. > > - */ > > - if (!total_swap_pages) > > - goto activate_locked; > > - > > - /* > > - * Only split shmem folio when CONFIG_THP_SWAP > > - * is not enabled. > > - */ > > - if (!IS_ENABLED(CONFIG_THP_SWAP) && > > - split_folio_to_list(folio, folio_list)) > > - goto keep_locked; > > } > > > > /* > > @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct > > list_head *folio_list, > > * starts and then write it out here. > > */ > > try_to_unmap_flush_dirty(); > > -try_pageout: > > - switch (pageout(folio, mapping, &plug)) { > > + switch (pageout(folio, mapping, &plug, folio_list)) > > { > > case PAGE_KEEP: > > goto keep_locked; > > case PAGE_ACTIVATE: > > - if (shmem_mapping(mapping) && > > folio_test_large(folio) && > > - !split_folio_to_list(folio, folio_list)) > > { > > + /* Shmem can be split when writeback to swap > > */ > > + if ((nr_pages > 1) && > > !folio_test_large(folio)) { > > + sc->nr_scanned -= (nr_pages - 1); > > nr_pages = 1; > > - goto try_pageout; > > } > > goto activate_locked; > > case PAGE_SUCCESS: > > + if ((nr_pages > 1) && > > !folio_test_large(folio)) { > > + sc->nr_scanned -= (nr_pages - 1); > > + nr_pages = 1; > > + } > > stat->nr_pageout += nr_pages; > > > > if (folio_test_writeback(folio)) {