Re: [PATCH 6.6 100/396] mm: swap: fix race between free_swap_and_cache() and swapoff()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/04/2024 16:42, Greg Kroah-Hartman wrote:
> 6.6-stable review patch.  If anyone has any objections, please let me know.

LGTM!

> 
> ------------------
> 
> From: Ryan Roberts <ryan.roberts@xxxxxxx>
> 
> [ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ]
> 
> There was previously a theoretical window where swapoff() could run and
> teardown a swap_info_struct while a call to free_swap_and_cache() was
> running in another thread.  This could cause, amongst other bad
> possibilities, swap_page_trans_huge_swapped() (called by
> free_swap_and_cache()) to access the freed memory for swap_map.
> 
> This is a theoretical problem and I haven't been able to provoke it from a
> test case.  But there has been agreement based on code review that this is
> possible (see link below).
> 
> Fix it by using get_swap_device()/put_swap_device(), which will stall
> swapoff().  There was an extra check in _swap_info_get() to confirm that
> the swap entry was not free.  This isn't present in get_swap_device()
> because it doesn't make sense in general due to the race between getting
> the reference and swapoff.  So I've added an equivalent check directly in
> free_swap_and_cache().
> 
> Details of how to provoke one possible issue (thanks to David Hildenbrand
> for deriving this):
> 
> --8<-----
> 
> __swap_entry_free() might be the last user and result in
> "count == SWAP_HAS_CACHE".
> 
> swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
> 
> So the question is: could someone reclaim the folio and turn
> si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
> 
> Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
> still references by swap entries.
> 
> Process 1 still references subpage 0 via swap entry.
> Process 2 still references subpage 1 via swap entry.
> 
> Process 1 quits. Calls free_swap_and_cache().
> -> count == SWAP_HAS_CACHE
> [then, preempted in the hypervisor etc.]
> 
> Process 2 quits. Calls free_swap_and_cache().
> -> count == SWAP_HAS_CACHE
> 
> Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
> __try_to_reclaim_swap().
> 
> __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
> put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
> swap_entry_free()->swap_range_free()->
> ...
> WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
> 
> What stops swapoff to succeed after process 2 reclaimed the swap cache
> but before process1 finished its call to swap_page_trans_huge_swapped()?
> 
> --8<-----
> 
> Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@xxxxxxx
> Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch")
> Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@xxxxxxxxxx/
> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: "Huang, Ying" <ying.huang@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
> ---
>  mm/swapfile.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 750314fff0c46..eada1351753e3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1226,6 +1226,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
>   * with get_swap_device() and put_swap_device(), unless the swap
>   * functions call get/put_swap_device() by themselves.
>   *
> + * Note that when only holding the PTL, swapoff might succeed immediately
> + * after freeing a swap entry. Therefore, immediately after
> + * __swap_entry_free(), the swap info might become stale and should not
> + * be touched without a prior get_swap_device().
> + *
>   * Check whether swap entry is valid in the swap device.  If so,
>   * return pointer to swap_info_struct, and keep the swap entry valid
>   * via preventing the swap device from being swapoff, until
> @@ -1603,13 +1608,19 @@ int free_swap_and_cache(swp_entry_t entry)
>  	if (non_swap_entry(entry))
>  		return 1;
>  
> -	p = _swap_info_get(entry);
> +	p = get_swap_device(entry);
>  	if (p) {
> +		if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) {
> +			put_swap_device(p);
> +			return 0;
> +		}
> +
>  		count = __swap_entry_free(p, entry);
>  		if (count == SWAP_HAS_CACHE &&
>  		    !swap_page_trans_huge_swapped(p, entry))
>  			__try_to_reclaim_swap(p, swp_offset(entry),
>  					      TTRS_UNMAPPED | TTRS_FULL);
> +		put_swap_device(p);
>  	}
>  	return p != NULL;
>  }





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux