On Tue, May 7, 2024 at 12:05 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 03.05.24 02:50, Barry Song wrote: > > From: Chuanhua Han <hanchuanhua@xxxxxxxx> > > > > When a large folio is found in the swapcache, the current implementation > > requires calling do_swap_page() nr_pages times, resulting in nr_pages > > page faults. This patch opts to map the entire large folio at once to > > minimize page faults. Additionally, redundant checks and early exits > > for ARM64 MTE restoring are removed. > > > > Signed-off-by: Chuanhua Han <hanchuanhua@xxxxxxxx> > > Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > mm/memory.c | 60 ++++++++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 48 insertions(+), 12 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 22e7c33cc747..940fdbe69fa1 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3968,6 +3968,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > pte_t pte; > > vm_fault_t ret = 0; > > void *shadow = NULL; > > + int nr_pages = 1; > > + unsigned long page_idx = 0; > > + unsigned long address = vmf->address; > > + pte_t *ptep; > > > > if (!pte_unmap_same(vmf)) > > goto out; > > @@ -4166,6 +4170,36 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > goto out_nomap; > > } > > > > + ptep = vmf->pte; > > + if (folio_test_large(folio) && folio_test_swapcache(folio)) { > > + int nr = folio_nr_pages(folio); > > + unsigned long idx = folio_page_idx(folio, page); > > + unsigned long folio_start = vmf->address - idx * PAGE_SIZE; > > + unsigned long folio_end = folio_start + nr * PAGE_SIZE; > > + pte_t *folio_ptep; > > + pte_t folio_pte; > > + > > + if (unlikely(folio_start < max(vmf->address & PMD_MASK, vma->vm_start))) > > + goto check_folio; > > + if (unlikely(folio_end > pmd_addr_end(vmf->address, vma->vm_end))) > > + goto check_folio; > > + > > + folio_ptep = vmf->pte - idx; > > + folio_pte = ptep_get(folio_ptep); > > + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || > > + swap_pte_batch(folio_ptep, nr, folio_pte) != nr) > > + goto check_folio; > > + > > + page_idx = idx; > > + address = folio_start; > > + ptep = folio_ptep; > > + nr_pages = nr; > > + entry = folio->swap; > > + page = &folio->page; > > + } > > + > > +check_folio: > > + > > /* > > * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A swap pte > > * must never point at an anonymous page in the swapcache that is > > @@ -4225,12 +4259,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > * We're already holding a reference on the page but haven't mapped it > > * yet. > > */ > > - swap_free_nr(entry, 1); > > + swap_free_nr(entry, nr_pages); > > if (should_try_to_free_swap(folio, vma, vmf->flags)) > > folio_free_swap(folio); > > > > - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > > - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > > + folio_ref_add(folio, nr_pages - 1); > > + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); > > + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); > > pte = mk_pte(page, vma->vm_page_prot); > > > > /* > > @@ -4240,34 +4275,35 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > * exclusivity. > > */ > > if (!folio_test_ksm(folio) && > > - (exclusive || folio_ref_count(folio) == 1)) { > > + (exclusive || (folio_ref_count(folio) == nr_pages && > > + folio_nr_pages(folio) == nr_pages))) { > > if (vmf->flags & FAULT_FLAG_WRITE) { > > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > > vmf->flags &= ~FAULT_FLAG_WRITE; > > I fail to convince myself that this change is correct, and if it is > correct, it's confusing (I think there is a dependency on > folio_free_swap() having been called and succeeding, such that we don't > have a folio that is in the swapcache at this point). > > Why can't we move the folio_ref_add() after this check and just leave > the check as it is? > > "folio_ref_count(folio) == 1" is as clear as it gets: we hold the single > reference, so we can do with this thing whatever we want: it's certainly > exclusive. No swapcache, no other people mapping it. Right. I believe the code works correctly but is a bit confusing. as you said, we might move folio_ref_add() behind folio_ref_count(folio) == 1. > > > -- > Cheers, > > David / dhildenb > Thanks Barry