On Sun, Jan 28, 2024 at 9:06 AM Chris Li <chrisl@xxxxxxxxxx> wrote: > > On Thu, Jan 18, 2024 at 3:12 AM Barry Song <21cnbao@xxxxxxxxx> wrote: > > > > From: Chuanhua Han <hanchuanhua@xxxxxxxx> > > > > On an embedded system like Android, more than half of anon memory is actually > > in swap devices such as zRAM. For example, while an app is switched to back- > > ground, its most memory might be swapped-out. > > > > Now we have mTHP features, unfortunately, if we don't support large folios > > swap-in, once those large folios are swapped-out, we immediately lose the > > performance gain we can get through large folios and hardware optimization > > such as CONT-PTE. > > > > This patch brings up mTHP swap-in support. Right now, we limit mTHP swap-in > > to those contiguous swaps which were likely swapped out from mTHP as a whole. > > > > On the other hand, the current implementation only covers the SWAP_SYCHRONOUS > > case. It doesn't support swapin_readahead as large folios yet. > > > > Right now, we are re-faulting large folios which are still in swapcache as a > > whole, this can effectively decrease extra loops and early-exitings which we > > have increased in arch_swap_restore() while supporting MTE restore for folios > > rather than page. > > > > Signed-off-by: Chuanhua Han <hanchuanhua@xxxxxxxx> > > Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > mm/memory.c | 108 +++++++++++++++++++++++++++++++++++++++++++++------- > > 1 file changed, 94 insertions(+), 14 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index f61a48929ba7..928b3f542932 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -107,6 +107,8 @@ EXPORT_SYMBOL(mem_map); > > static vm_fault_t do_fault(struct vm_fault *vmf); > > static vm_fault_t do_anonymous_page(struct vm_fault *vmf); > > static bool vmf_pte_changed(struct vm_fault *vmf); > > +static struct folio *alloc_anon_folio(struct vm_fault *vmf, > > + bool (*pte_range_check)(pte_t *, int)); > > > > /* > > * Return true if the original pte was a uffd-wp pte marker (so the pte was > > @@ -3784,6 +3786,34 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) > > return VM_FAULT_SIGBUS; > > } > > > > +static bool pte_range_swap(pte_t *pte, int nr_pages) > > +{ > > + int i; > > + swp_entry_t entry; > > + unsigned type; > > + pgoff_t start_offset; > > + > > + entry = pte_to_swp_entry(ptep_get_lockless(pte)); > > + if (non_swap_entry(entry)) > > + return false; > > + start_offset = swp_offset(entry); > > + if (start_offset % nr_pages) > > + return false; > > + > > + type = swp_type(entry); > > + for (i = 1; i < nr_pages; i++) { > > + entry = pte_to_swp_entry(ptep_get_lockless(pte + i)); > > + if (non_swap_entry(entry)) > > + return false; > > + if (swp_offset(entry) != start_offset + i) > > + return false; > > + if (swp_type(entry) != type) > > + return false; > > + } > > + > > + return true; > > +} > > + > > /* > > * We enter with non-exclusive mmap_lock (to exclude vma changes, > > * but allow concurrent faults), and pte mapped but not yet locked. > > @@ -3804,6 +3834,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > pte_t pte; > > vm_fault_t ret = 0; > > void *shadow = NULL; > > + int nr_pages = 1; > > + unsigned long start_address; > > + pte_t *start_pte; > > > > if (!pte_unmap_same(vmf)) > > goto out; > > @@ -3868,13 +3901,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && > > __swap_count(entry) == 1) { > > /* skip swapcache */ > > - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, > > - vma, vmf->address, false); > > + folio = alloc_anon_folio(vmf, pte_range_swap); > > page = &folio->page; > > if (folio) { > > __folio_set_locked(folio); > > __folio_set_swapbacked(folio); > > > > + if (folio_test_large(folio)) { > > + unsigned long start_offset; > > + > > + nr_pages = folio_nr_pages(folio); > > + start_offset = swp_offset(entry) & ~(nr_pages - 1); > > + entry = swp_entry(swp_type(entry), start_offset); > > + } > > + > > if (mem_cgroup_swapin_charge_folio(folio, > > vma->vm_mm, GFP_KERNEL, > > entry)) { > > @@ -3980,6 +4020,39 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > */ > > vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, > > &vmf->ptl); > > + > > + start_address = vmf->address; > > + start_pte = vmf->pte; > > + if (folio_test_large(folio)) { > > + unsigned long nr = folio_nr_pages(folio); > > + unsigned long addr = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); > > + pte_t *pte_t = vmf->pte - (vmf->address - addr) / PAGE_SIZE; > > I forgot about one comment here. > Please change the variable name other than "pte_t", it is a bit > strange to use the typedef name as variable name here. > make sense! > Chris Thanks Barry