On 09/04/2024 09:26, Barry Song wrote: > From: Barry Song <v-songbaohua@xxxxxxxx> > > Add a boolean argument named any_shared. If any of the swap entries are > non-exclusive, set any_shared to true. The function do_swap_page() can > then utilize this information to determine whether the entire large > folio can be reused. > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > --- > mm/internal.h | 9 ++++++++- > mm/madvise.c | 2 +- > mm/memory.c | 2 +- > 3 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 9d3250b4a08a..cae39c372bfc 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -238,7 +238,8 @@ static inline pte_t pte_next_swp_offset(pte_t pte) > * > * Return: the number of table entries in the batch. > */ > -static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte) > +static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte, > + bool *any_shared) Please update the docs in the comment above this for the new param; follow folio_pte_batch()'s docs as a template. > { > pte_t expected_pte = pte_next_swp_offset(pte); > const pte_t *end_ptep = start_ptep + max_nr; > @@ -248,12 +249,18 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte) > VM_WARN_ON(!is_swap_pte(pte)); > VM_WARN_ON(non_swap_entry(pte_to_swp_entry(pte))); > > + if (any_shared) > + *any_shared |= !pte_swp_exclusive(pte); This is different from the approach in folio_pte_batch(). It inits *any_shared to false and does NOT include the value of the first pte. I think that's odd, personally and I prefer your approach. I'm not sure if there was a good reason that David chose the other approach? Regardless, I think both functions should follow the same pattern here. If sticking with your approach, why is this initial flag being ORed? Surely it should just be initialized to get rid of any previous guff? Thanks, Ryan > + > while (ptep < end_ptep) { > pte = ptep_get(ptep); > > if (!pte_same(pte, expected_pte)) > break; > > + if (any_shared) > + *any_shared |= !pte_swp_exclusive(pte); > + > expected_pte = pte_next_swp_offset(expected_pte); > ptep++; > } > diff --git a/mm/madvise.c b/mm/madvise.c > index f59169888b8e..d34ca6983227 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -671,7 +671,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, > entry = pte_to_swp_entry(ptent); > if (!non_swap_entry(entry)) { > max_nr = (end - addr) / PAGE_SIZE; > - nr = swap_pte_batch(pte, max_nr, ptent); > + nr = swap_pte_batch(pte, max_nr, ptent, NULL); > nr_swap -= nr; > free_swap_and_cache_nr(entry, nr); > clear_not_present_full_ptes(mm, addr, pte, nr, tlb->fullmm); > diff --git a/mm/memory.c b/mm/memory.c > index 2702d449880e..c4a52e8d740a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1638,7 +1638,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > folio_put(folio); > } else if (!non_swap_entry(entry)) { > max_nr = (end - addr) / PAGE_SIZE; > - nr = swap_pte_batch(pte, max_nr, ptent); > + nr = swap_pte_batch(pte, max_nr, ptent, NULL); > /* Genuine swap entries, hence a private anon pages */ > if (!should_zap_cows(details)) > continue;