Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio

Ryan Roberts <ryan.roberts@xxxxxxx> · Tue, 12 Mar 2024 12:34:27 +0000

On 04/03/2024 08:13, Barry Song wrote:
> From: Chuanhua Han <hanchuanhua@xxxxxxxx>
> 
> should_try_to_free_swap() works with an assumption that swap-in is always done
> at normal page granularity, aka, folio_nr_pages = 1. To support large folio
> swap-in, this patch removes the assumption.
> 
> Signed-off-by: Chuanhua Han <hanchuanhua@xxxxxxxx>
> Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx>
> Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> Acked-by: Chris Li <chrisl@xxxxxxxxxx>
> ---
>  mm/memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index abd4f33d62c9..e0d34d705e07 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
>  	 * reference only in case it's likely that we'll be the exlusive user.
>  	 */
>  	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
> -		folio_ref_count(folio) == 2;
> +		folio_ref_count(folio) == (1 + folio_nr_pages(folio));

I don't think this is correct; one reference has just been added to the folio in
do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by
allocating. If it came from the swapcache, it could be a large folio, because we
swapped out a large folio and never removed it from swapcache. But in that case,
others may have partially mapped it, so the refcount could legitimately equal
the number of pages while still not being exclusively mapped.

I'm guessing this logic is trying to estimate when we are likely exclusive so
that we remove from swapcache (release ref) and can then reuse rather than CoW
the folio? The main CoW path currently CoWs page-by-page even for large folios,
and with Barry's recent patch, even the last page gets copied. So not sure what
this change is really trying to achieve?

>  }
>  
>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)