Hi Ryan, Ryan Roberts <ryan.roberts@xxxxxxx> 于2024年3月13日周三 17:10写道: > > On 13/03/2024 02:21, Chuanhua Han wrote: > > hi, Ryan Roberts > > > > 在 2024/3/12 20:34, Ryan Roberts 写道: > >> On 04/03/2024 08:13, Barry Song wrote: > >>> From: Chuanhua Han <hanchuanhua@xxxxxxxx> > >>> > >>> should_try_to_free_swap() works with an assumption that swap-in is always done > >>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio > >>> swap-in, this patch removes the assumption. > >>> > >>> Signed-off-by: Chuanhua Han <hanchuanhua@xxxxxxxx> > >>> Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx> > >>> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > >>> Acked-by: Chris Li <chrisl@xxxxxxxxxx> > >>> --- > >>> mm/memory.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index abd4f33d62c9..e0d34d705e07 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, > >>> * reference only in case it's likely that we'll be the exlusive user. > >>> */ > >>> return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && > >>> - folio_ref_count(folio) == 2; > >>> + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); > >> I don't think this is correct; one reference has just been added to the folio in > >> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by > >> allocating. If it came from the swapcache, it could be a large folio, because we > >> swapped out a large folio and never removed it from swapcache. But in that case, > >> others may have partially mapped it, so the refcount could legitimately equal > >> the number of pages while still not being exclusively mapped. > >> > >> I'm guessing this logic is trying to estimate when we are likely exclusive so > >> that we remove from swapcache (release ref) and can then reuse rather than CoW > >> the folio? The main CoW path currently CoWs page-by-page even for large folios, > >> and with Barry's recent patch, even the last page gets copied. So not sure what > >> this change is really trying to achieve? > >> > > First, if it is a large folio in the swap cache, then its refcont is at > > least folio_nr_pages(folio) : > > Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap > cache, and you were assuming 1 ref per page taken by do_swap_page(). I > understand now. On this basis: > > Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx> Thank you for your review! > > > > > > > For example, in add_to_swap_cache path: > > > > int add_to_swap_cache(struct folio *folio, swp_entry_t entry, > > gfp_t gfp, void **shadowp) > > { > > struct address_space *address_space = swap_address_space(entry); > > pgoff_t idx = swp_offset(entry); > > XA_STATE_ORDER(xas, &address_space->i_pages, idx, > > folio_order(folio)); > > unsigned long i, nr = folio_nr_pages(folio); <--- > > void *old; > > ... > > folio_ref_add(folio, nr); <--- > > folio_set_swapcache(folio); > > ... > > } > > > > > > * > > > > Then in the do_swap_page path: > > > > * if (should_try_to_free_swap(folio, vma, vmf->flags)) > > folio_free_swap(folio); > > * > > > > * It also indicates that only folio in the swap cache will call > > folio_free_swap > > * to delete it from the swap cache, So I feel like this patch is > > necessary!? 😁 > > > >>> } > >>> > >>> static vm_fault_t pte_marker_clear(struct vm_fault *vmf) > > > > Thanks, > > > > Chuanhua > > > > Thanks, Chuanhua