On 01/04/2024 16:42, Greg Kroah-Hartman wrote: > 6.6-stable review patch. If anyone has any objections, please let me know. LGTM! > > ------------------ > > From: Ryan Roberts <ryan.roberts@xxxxxxx> > > [ Upstream commit 82b1c07a0af603e3c47b906c8e991dc96f01688e ] > > There was previously a theoretical window where swapoff() could run and > teardown a swap_info_struct while a call to free_swap_and_cache() was > running in another thread. This could cause, amongst other bad > possibilities, swap_page_trans_huge_swapped() (called by > free_swap_and_cache()) to access the freed memory for swap_map. > > This is a theoretical problem and I haven't been able to provoke it from a > test case. But there has been agreement based on code review that this is > possible (see link below). > > Fix it by using get_swap_device()/put_swap_device(), which will stall > swapoff(). There was an extra check in _swap_info_get() to confirm that > the swap entry was not free. This isn't present in get_swap_device() > because it doesn't make sense in general due to the race between getting > the reference and swapoff. So I've added an equivalent check directly in > free_swap_and_cache(). > > Details of how to provoke one possible issue (thanks to David Hildenbrand > for deriving this): > > --8<----- > > __swap_entry_free() might be the last user and result in > "count == SWAP_HAS_CACHE". > > swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0. > > So the question is: could someone reclaim the folio and turn > si->inuse_pages==0, before we completed swap_page_trans_huge_swapped(). > > Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are > still references by swap entries. > > Process 1 still references subpage 0 via swap entry. > Process 2 still references subpage 1 via swap entry. > > Process 1 quits. Calls free_swap_and_cache(). > -> count == SWAP_HAS_CACHE > [then, preempted in the hypervisor etc.] > > Process 2 quits. Calls free_swap_and_cache(). > -> count == SWAP_HAS_CACHE > > Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls > __try_to_reclaim_swap(). > > __try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()-> > put_swap_folio()->free_swap_slot()->swapcache_free_entries()-> > swap_entry_free()->swap_range_free()-> > ... > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > > What stops swapoff to succeed after process 2 reclaimed the swap cache > but before process1 finished its call to swap_page_trans_huge_swapped()? > > --8<----- > > Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@xxxxxxx > Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch") > Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@xxxxxxxxxx/ > Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: "Huang, Ying" <ying.huang@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> > --- > mm/swapfile.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 750314fff0c46..eada1351753e3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1226,6 +1226,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, > * with get_swap_device() and put_swap_device(), unless the swap > * functions call get/put_swap_device() by themselves. > * > + * Note that when only holding the PTL, swapoff might succeed immediately > + * after freeing a swap entry. Therefore, immediately after > + * __swap_entry_free(), the swap info might become stale and should not > + * be touched without a prior get_swap_device(). > + * > * Check whether swap entry is valid in the swap device. If so, > * return pointer to swap_info_struct, and keep the swap entry valid > * via preventing the swap device from being swapoff, until > @@ -1603,13 +1608,19 @@ int free_swap_and_cache(swp_entry_t entry) > if (non_swap_entry(entry)) > return 1; > > - p = _swap_info_get(entry); > + p = get_swap_device(entry); > if (p) { > + if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) { > + put_swap_device(p); > + return 0; > + } > + > count = __swap_entry_free(p, entry); > if (count == SWAP_HAS_CACHE && > !swap_page_trans_huge_swapped(p, entry)) > __try_to_reclaim_swap(p, swp_offset(entry), > TTRS_UNMAPPED | TTRS_FULL); > + put_swap_device(p); > } > return p != NULL; > }