[snip] > > >> > > > >> > VM_BUG_ON(!folio_test_anon(folio) || > > >> > (pte_write(pte) && !PageAnonExclusive(page))); > > >> > - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); > > >> > - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); > > >> > + set_ptes(vma->vm_mm, start_address, start_pte, pte, nr_pages); > > >> > + vmf->orig_pte = ptep_get(vmf->pte); > > >> > + arch_do_swap_page(vma->vm_mm, vma, start_address, pte, pte); > > >> > > >> Do we need to call arch_do_swap_page() for each subpage? IIUC, the > > >> corresponding arch_unmap_one() will be called for each subpage. > > > > > > i actually thought about this very carefully, right now, the only one who > > > needs this is sparc and it doesn't support THP_SWAPOUT at all. and > > > there is no proof doing restoration one by one won't really break sparc. > > > so i'd like to defer this to when sparc really needs THP_SWAPOUT. > > > > Let's ask SPARC developer (Cced) for this. > > > > IMHO, even if we cannot get help, we need to change code with our > > understanding instead of deferring it. > > ok. Thanks for Ccing sparc developers. Hi Khalid & Ying (also Cced sparc maillist), SPARC is the only platform which needs arch_do_swap_page(), right now, its THP_SWAPOUT is not enabled. so we will not really hit a large folio in swapcache. just in case you might need THP_SWAPOUT later, i am changing the code as below, @@ -4286,7 +4285,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) VM_BUG_ON(!folio_test_anon(folio) || (pte_write(pte) && !PageAnonExclusive(page))); set_ptes(vma->vm_mm, start_address, start_ptep, pte, nr_pages); - arch_do_swap_page(vma->vm_mm, vma, start_address, pte, pte); + for (int i = 0; i < nr_pages; i++) { + arch_do_swap_page(vma->vm_mm, vma, start_address + i * PAGE_SIZE, + pte, pte); + pte = pte_advance_pfn(pte, 1); + } folio_unlock(folio); if (folio != swapcache && swapcache) { for sparc, nr_pages will always be 1(THP_SWAPOUT not enabled). for arm64/x86/riscv, it seems redundant to do a for loop "for (int i = 0; i < nr_pages; i++)". so another option is adding a helper as below to avoid the idle loop for arm64/x86/riscv etc. diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e2f45e22a6d1..ea314a5f9b5e 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1085,6 +1085,28 @@ static inline void arch_do_swap_page(struct mm_struct *mm, { } + +static inline void arch_do_swap_page_nr(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long addr, + pte_t pte, pte_t oldpte, + int nr) +{ + +} +#else +static inline void arch_do_swap_page_nr(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long addr, + pte_t pte, pte_t oldpte, + int nr) +{ + for (int i = 0; i < nr; i++) { + arch_do_swap_page(vma->vm_mm, vma, addr + i * PAGE_SIZE, + pte_advance_pfn(pte, i), + pte_advance_pfn(oldpte, i)); + } +} #endif Please tell me your preference. BTW, i found oldpte and pte are always same in do_swap_page(), is it something wrong? does arch_do_swap_page() really need two same arguments? vmf->orig_pte = pte; ... arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); > > > > > > on the other hand, it seems really bad we have both > > > arch_swap_restore - for this, arm64 has moved to using folio > > > and > > > arch_do_swap_page > > > > > > we should somehow unify them later if sparc wants THP_SWPOUT. Thanks Barry