On Fri, May 3, 2024 at 5:41 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 03/05/2024 01:50, Barry Song wrote: > > From: Barry Song <v-songbaohua@xxxxxxxx> > > > > There could arise a necessity to obtain the first pte_t from a swap > > pte_t located in the middle. For instance, this may occur within the > > context of do_swap_page(), where a page fault can potentially occur in > > any PTE of a large folio. To address this, the following patch introduces > > pte_move_swp_offset(), a function capable of bidirectional movement by > > a specified delta argument. Consequently, pte_increment_swp_offset() > > You mean pte_next_swp_offset()? yes. > > > will directly invoke it with delta = 1. > > > > Suggested-by: "Huang, Ying" <ying.huang@xxxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > mm/internal.h | 25 +++++++++++++++++++++---- > > 1 file changed, 21 insertions(+), 4 deletions(-) > > > > diff --git a/mm/internal.h b/mm/internal.h > > index c5552d35d995..cfe4aed66a5c 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -211,18 +211,21 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, > > } > > > > /** > > - * pte_next_swp_offset - Increment the swap entry offset field of a swap pte. > > + * pte_move_swp_offset - Move the swap entry offset field of a swap pte > > + * forward or backward by delta > > * @pte: The initial pte state; is_swap_pte(pte) must be true and > > * non_swap_entry() must be false. > > + * @delta: The direction and the offset we are moving; forward if delta > > + * is positive; backward if delta is negative > > * > > - * Increments the swap offset, while maintaining all other fields, including > > + * Moves the swap offset, while maintaining all other fields, including > > * swap type, and any swp pte bits. The resulting pte is returned. > > */ > > -static inline pte_t pte_next_swp_offset(pte_t pte) > > +static inline pte_t pte_move_swp_offset(pte_t pte, long delta) > > We have equivalent functions for pfn: > > pte_next_pfn() > pte_advance_pfn() > > Although the latter takes an unsigned long and only moves forward currently. I > wonder if it makes sense to have their naming and semantics match? i.e. change > pte_advance_pfn() to pte_move_pfn() and let it move backwards too. > > I guess we don't have a need for that and it adds more churn. we might have a need in the below case. A forks B, then A and B share large folios. B unmap/exit, then large folios of process A become single-mapped. Right now, while writing A's folios, we are CoWing A's large folios into many small folios. I believe we can reuse the entire large folios instead of doing nr_pages CoW and page faults. In this case, we might want to get the first PTE from vmf->pte. Another case, might be A forks B, and we write either A or B, we might CoW an entire large folios instead CoWing nr_pages small folios. case 1 seems more useful, I might have a go after some days. then we might see pte_move_pfn(). > > Anyway: > > Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx> thanks! > > > > { > > swp_entry_t entry = pte_to_swp_entry(pte); > > pte_t new = __swp_entry_to_pte(__swp_entry(swp_type(entry), > > - (swp_offset(entry) + 1))); > > + (swp_offset(entry) + delta))); > > > > if (pte_swp_soft_dirty(pte)) > > new = pte_swp_mksoft_dirty(new); > > @@ -234,6 +237,20 @@ static inline pte_t pte_next_swp_offset(pte_t pte) > > return new; > > } > > > > + > > +/** > > + * pte_next_swp_offset - Increment the swap entry offset field of a swap pte. > > + * @pte: The initial pte state; is_swap_pte(pte) must be true and > > + * non_swap_entry() must be false. > > + * > > + * Increments the swap offset, while maintaining all other fields, including > > + * swap type, and any swp pte bits. The resulting pte is returned. > > + */ > > +static inline pte_t pte_next_swp_offset(pte_t pte) > > +{ > > + return pte_move_swp_offset(pte, 1); > > +} > > + > > /** > > * swap_pte_batch - detect a PTE batch for a set of contiguous swap entries > > * @start_ptep: Page table pointer for the first entry. > Barry