David Hildenbrand <david@xxxxxxxxxx> writes: > On 05.03.24 17:33, Ryan Roberts wrote: >> On 05/03/2024 15:50, David Hildenbrand wrote: >>> On 05.03.24 16:13, Ryan Roberts wrote: >>>> There was previously a theoretical window where swapoff() could run and >>>> teardown a swap_info_struct while a call to free_swap_and_cache() was >>>> running in another thread. This could cause, amongst other bad >>>> possibilities, swap_page_trans_huge_swapped() (called by >>>> free_swap_and_cache()) to access the freed memory for swap_map. >>>> >>>> This is a theoretical problem and I haven't been able to provoke it from >>>> a test case. But there has been agreement based on code review that this >>>> is possible (see link below). >>>> >>>> Fix it by using get_swap_device()/put_swap_device(), which will stall >>>> swapoff(). There was an extra check in _swap_info_get() to confirm that >>>> the swap entry was valid. This wasn't present in get_swap_device() so >>>> I've added it. I couldn't find any existing get_swap_device() call sites >>>> where this extra check would cause any false alarms. >>>> >>>> Details of how to provoke one possible issue (thanks to David Hilenbrand >>>> for deriving this): >>> >>> Almost >>> >>> "s/Hilenbrand/Hildenbrand/" :) >> >> Ahh sorry... I even explicitly checked it against your name on emails... fat >> fingers... > > No need to be sorry. Even your average German person would get it wrong, > because there are other (more common) variants :) > > [...] > >>>> >>> >>> LGTM >>> >>> Are you planning on sending a doc extension for get_swap_device()? >> >> I saw your comment: >> >> We should likely update the documentation of get_swap_device(), that after >> decrementing the refcount, the SI might become stale and should not be touched >> without a prior get_swap_device(). >> >> But when I went to make the changes, I saw the documentation already said: >> >> ...we need to enclose all swap related functions with get_swap_device() and >> put_swap_device()... Notice that swapoff ... can still happen before the >> percpu_ref_tryget_live() in get_swap_device() or after the percpu_ref_put() in >> put_swap_device()... The caller must be prepared for that. >> >> I thought that already covered it? I'm sure as usual, I've misunderstood your >> point. Happy to respin if you have something in mind? > > No need to respin, we could clarify on top, if we decide it makes sense. > > I was thinking about something like this, making it clearer that the PTL > discussion above does not express the corner case we discovered: > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 2b3a2d85e350b..646a436581eee 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1232,6 +1232,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, > * with get_swap_device() and put_swap_device(), unless the swap > * functions call get/put_swap_device() by themselves. > * > + * Note that when only holding the PTL, swapoff might succeed immediately > + * after freeing a swap entry. Therefore, immediately after > + * __swap_entry_free(), the swap info might become stale and should not > + * be touched without a prior get_swap_device(). > + * > * Check whether swap entry is valid in the swap device. If so, > * return pointer to swap_info_struct, and keep the swap entry valid > * via preventing the swap device from being swapoff, until LGTM. Thanks! -- Best Regards, Huang, Ying