On 13/03/2024 02:21, Chuanhua Han wrote: > hi, Ryan Roberts > > 在 2024/3/12 20:34, Ryan Roberts 写道: >> On 04/03/2024 08:13, Barry Song wrote: >>> From: Chuanhua Han <hanchuanhua@xxxxxxxx> >>> >>> should_try_to_free_swap() works with an assumption that swap-in is always done >>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio >>> swap-in, this patch removes the assumption. >>> >>> Signed-off-by: Chuanhua Han <hanchuanhua@xxxxxxxx> >>> Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx> >>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> >>> Acked-by: Chris Li <chrisl@xxxxxxxxxx> >>> --- >>> mm/memory.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index abd4f33d62c9..e0d34d705e07 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, >>> * reference only in case it's likely that we'll be the exlusive user. >>> */ >>> return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && >>> - folio_ref_count(folio) == 2; >>> + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); >> I don't think this is correct; one reference has just been added to the folio in >> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by >> allocating. If it came from the swapcache, it could be a large folio, because we >> swapped out a large folio and never removed it from swapcache. But in that case, >> others may have partially mapped it, so the refcount could legitimately equal >> the number of pages while still not being exclusively mapped. >> >> I'm guessing this logic is trying to estimate when we are likely exclusive so >> that we remove from swapcache (release ref) and can then reuse rather than CoW >> the folio? The main CoW path currently CoWs page-by-page even for large folios, >> and with Barry's recent patch, even the last page gets copied. So not sure what >> this change is really trying to achieve? >> > First, if it is a large folio in the swap cache, then its refcont is at > least folio_nr_pages(folio) : Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap cache, and you were assuming 1 ref per page taken by do_swap_page(). I understand now. On this basis: Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx> > > > For example, in add_to_swap_cache path: > > int add_to_swap_cache(struct folio *folio, swp_entry_t entry, > gfp_t gfp, void **shadowp) > { > struct address_space *address_space = swap_address_space(entry); > pgoff_t idx = swp_offset(entry); > XA_STATE_ORDER(xas, &address_space->i_pages, idx, > folio_order(folio)); > unsigned long i, nr = folio_nr_pages(folio); <--- > void *old; > ... > folio_ref_add(folio, nr); <--- > folio_set_swapcache(folio); > ... > } > > > * > > Then in the do_swap_page path: > > * if (should_try_to_free_swap(folio, vma, vmf->flags)) > folio_free_swap(folio); > * > > * It also indicates that only folio in the swap cache will call > folio_free_swap > * to delete it from the swap cache, So I feel like this patch is > necessary!? 😁 > >>> } >>> >>> static vm_fault_t pte_marker_clear(struct vm_fault *vmf) > > Thanks, > > Chuanhua >