On Wed, Jan 26, 2022 at 2:00 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > reuse_swap_page() currently indicates if we can write to an anon page > without COW. A COW is required if the page is shared by multiple > processes (either already mapped or via swap entries) or if there is > concurrent writeback that cannot tolerate concurrent page modifications. > > reuse_swap_page() doesn't check for pending references from other > processes that already unmapped the page, however, > is_refcount_suitable() essentially does the same thing in the context of > khugepaged. khugepaged is the last remaining user of reuse_swap_page() and > we want to remove that function. > > In the context of khugepaged, we are not actually going to write to the > page and we don't really care about other processes mapping the page: > for example, without swap, we don't care about shared pages at all. > > The current logic seems to be: > * Writable: -> Not shared, but might be in the swapcache. Nobody can > fault it in from the swapcache as there are no other swap entries. > * Readable and not in the swapcache: Might be shared (but nobody can > fault it in from the swapcache). > * Readable and in the swapcache: Might be shared and someone might be > able to fault it in from the swapcache. Make sure we're the exclusive > owner via reuse_swap_page(). > > Having to guess due to lack of comments and documentation, the current > logic really only wants to make sure that a page that might be shared > cannot be faulted in from the swapcache while khugepaged is active. > It's hard to guess why that is that case and if it's really still required, > but let's try keeping that logic unmodified. I don't think it could be faulted in while khugepaged is active since khugepaged does hold mmap_lock in write mode IIUC. So page fault is serialized against khugepaged. My wild guess is that collapsing shared pages was not supported before v5.8, so we need reuse_swap_page() to tell us if the page in swap cache is shared or not. But it is not true anymore. And khugepaged just allocates a THP then copy the data from base pages to huge page then replace PTEs to PMD, it doesn't change the content of the page, so I failed to see a problem by collapsing a shared page in swap cache. But I'm really not entirely sure, I may miss something... > > Instead of relying on reuse_swap_page(), let's unconditionally > try_to_free_swap(), special casing PageKsm(). try_to_free_swap() will fail > if there are still swap entries targeting the page or if the page is under > writeback. > > After a successful try_to_free_swap() that page cannot be readded to the > swapcache because we're keeping the page locked and removed from the LRU > until we actually perform the copy. So once we succeeded removing a page > from the swapcache, it cannot be re-added until we're done copying. Add a > comment stating that. > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > --- > mm/khugepaged.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 35f14d0a00a6..bc0ff598e98f 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -683,10 +683,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > goto out; > } > if (!pte_write(pteval) && PageSwapCache(page) && > - !reuse_swap_page(page)) { > + (PageKsm(page) || !try_to_free_swap(page))) { > /* > - * Page is in the swap cache and cannot be re-used. > - * It cannot be collapsed into a THP. > + * Possibly shared page cannot be removed from the > + * swapache. It cannot be collapsed into a THP. > */ > unlock_page(page); > result = SCAN_SWAP_CACHE_PAGE; > @@ -702,6 +702,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > result = SCAN_DEL_PAGE_LRU; > goto out; > } > + > + /* > + * We're holding the page lock and removed the page from the > + * LRU. Once done copying, we'll unlock and readd to the > + * LRU via release_pte_page(). If the page is still in the > + * swapcache, we're the exclusive owner. Due to the page lock > + * the page cannot be added to the swapcache until we're done > + * and consequently it cannot be faulted in from the swapcache > + * into another process. > + */ > mod_node_page_state(page_pgdat(page), > NR_ISOLATED_ANON + page_is_file_lru(page), > compound_nr(page)); > -- > 2.34.1 >