On Thu, 21 Sep 2023 20:27:51 +0800 Jiexun Wang <wangjiexun@xxxxxxxxxxx> wrote: > Currently the madvise_cold_or_pageout_pte_range() function exhibits > significant latency under memory pressure, which can be effectively > reduced by adding cond_resched() within the loop. > > When the batch_count reaches SWAP_CLUSTER_MAX, we reschedule > the task to ensure fairness and avoid long lock holding times. > > ... > > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -354,6 +354,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > struct folio *folio = NULL; > LIST_HEAD(folio_list); > bool pageout_anon_only_filter; > + unsigned int batch_count = 0; > > if (fatal_signal_pending(current)) > return -EINTR; > @@ -433,6 +434,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > regular_folio: > #endif > tlb_change_page_size(tlb, PAGE_SIZE); > +restart: > start_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); The handling of start_pte looks OK. > if (!start_pte) > return 0; > @@ -441,6 +443,15 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, > for (; addr < end; pte++, addr += PAGE_SIZE) { > ptent = ptep_get(pte); > > + if (++batch_count == SWAP_CLUSTER_MAX) { > + batch_count = 0; > + if (need_resched()) { > + pte_unmap_unlock(start_pte, ptl); > + cond_resched(); > + goto restart; > + } > + } > + > if (pte_none(ptent)) > continue; > I think this patch looks OK, but would appreciate careful review from others, please.