Re: [PATCH v1] mm: swap: Fix race between free_swap_and_cache() and swapoff()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/03/2024 02:19, Johannes Weiner wrote:
> Hi Ryan,
> 
> On Tue, Mar 05, 2024 at 03:13:49PM +0000, Ryan Roberts wrote:
>> There was previously a theoretical window where swapoff() could run and
>> teardown a swap_info_struct while a call to free_swap_and_cache() was
>> running in another thread. This could cause, amongst other bad
>> possibilities, swap_page_trans_huge_swapped() (called by
>> free_swap_and_cache()) to access the freed memory for swap_map.
>>
>> This is a theoretical problem and I haven't been able to provoke it from
>> a test case. But there has been agreement based on code review that this
>> is possible (see link below).
>>
>> Fix it by using get_swap_device()/put_swap_device(), which will stall
>> swapoff(). There was an extra check in _swap_info_get() to confirm that
>> the swap entry was valid. This wasn't present in get_swap_device() so
>> I've added it. I couldn't find any existing get_swap_device() call sites
>> where this extra check would cause any false alarms.
> 
> Unfortunately, I found one, testing current mm-everything:
> 
> [  189.420777] get_swap_device: Unused swap offset entry 000641ae
> [  189.426648] ------------[ cut here ]------------
> [  189.431290] WARNING: CPU: 3 PID: 369 at mm/swapfile.c:1301 get_swap_device+0x2da/0x3f0
> [  189.439242] CPU: 3 PID: 369 Comm: cachish Not tainted 6.8.0-rc5-00527-g19d98776f227-dirty #30
> [  189.447791] Hardware name: Micro-Star International Co., Ltd. MS-7B98/Z390-A PRO (MS-7B98), BIOS 1.80 12/25/2019
> [  189.457998] RIP: 0010:get_swap_device+0x2da/0x3f0
> [  189.462721] Code: a8 03 75 2a 65 48 ff 08 e9 36 ff ff ff 4c 89 e9 48 c7 c2 40 fd 91 83 48 c7 c6 c0 f9 91 83 48 c7 c7 60 ee 91 83 e8 26 2f af ff <0f> 0b eb af 4c 8d 6b 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48
> [  189.481497] RSP: 0000:ffffc90000cff8a8 EFLAGS: 00010282
> [  189.486760] RAX: 0000000000000032 RBX: ffff8881262eee00 RCX: 0000000000000000
> [  189.493909] RDX: 0000000000000001 RSI: ffffffff83a1e620 RDI: 0000000000000001
> [  189.501054] RBP: 1ffff9200019ff15 R08: 0000000000000001 R09: fffff5200019fee1
> [  189.508202] R10: ffffc90000cff70f R11: 0000000000000001 R12: ffffc900018d51ae
> [  189.515346] R13: 00000000000641ae R14: 0000000000000000 R15: 00000000000641af
> [  189.522494] FS:  00007f7120263680(0000) GS:ffff88841c380000(0000) knlGS:0000000000000000
> [  189.530591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  189.536373] CR2: 00007f6e659a2ea3 CR3: 0000000046860004 CR4: 00000000003706f0
> [  189.543516] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  189.550661] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  189.557811] Call Trace:
> [  189.560276]  <TASK>
> [  189.562393]  ? __warn+0xc4/0x250
> [  189.565647]  ? get_swap_device+0x2da/0x3f0
> [  189.569761]  ? report_bug+0x348/0x440
> [  189.573444]  ? handle_bug+0x6d/0x90
> [  189.576951]  ? exc_invalid_op+0x13/0x40
> [  189.580810]  ? asm_exc_invalid_op+0x16/0x20
> [  189.585019]  ? get_swap_device+0x2da/0x3f0
> [  189.589142]  ? get_swap_device+0x2da/0x3f0
> [  189.593255]  ? __pfx_get_swap_device+0x10/0x10
> [  189.597717]  __read_swap_cache_async+0x9f/0x630
> [  189.602281]  ? __pfx___read_swap_cache_async+0x10/0x10
> [  189.607439]  ? __mod_memcg_lruvec_state+0x238/0x4f0
> [  189.612344]  ? __pfx_swp_swap_info+0x10/0x10
> [  189.616652]  swap_cluster_readahead+0x2cd/0x510
> [  189.621206]  ? __pfx_swap_cluster_readahead+0x10/0x10
> [  189.626279]  ? swap_cache_get_folio+0xcd/0x360
> [  189.630760]  ? __count_memcg_events+0x10a/0x370
> [  189.635318]  shmem_swapin_folio+0x2f2/0xc60
> [  189.639525]  ? __pfx__raw_spin_lock+0x10/0x10
> [  189.643908]  ? __pte_offset_map+0x19/0x1d0
> [  189.648024]  shmem_get_folio_gfp+0x307/0xe30
> [  189.652323]  ? __schedule+0x9f0/0x1fe0
> [  189.656096]  ? __pfx_shmem_get_folio_gfp+0x10/0x10
> [  189.660923]  ? filemap_map_pages+0x999/0xe60
> [  189.665211]  shmem_fault+0x1d9/0x810
> [  189.668834]  ? __pfx_shmem_fault+0x10/0x10
> [  189.672954]  ? __pfx_filemap_map_pages+0x10/0x10
> [  189.677590]  __do_fault+0xed/0x390
> [  189.681012]  __handle_mm_fault+0x1ba1/0x2e80
> [  189.685297]  ? __pfx___handle_mm_fault+0x10/0x10
> [  189.689933]  ? __pfx_down_read_trylock+0x10/0x10
> [  189.694570]  ? __pfx_hrtimer_nanosleep+0x10/0x10
> [  189.699215]  handle_mm_fault+0xe0/0x560
> [  189.703074]  ? __pfx_restore_fpregs_from_fpstate+0x10/0x10
> [  189.708620]  do_user_addr_fault+0x2ba/0x9d0
> [  189.712828]  exc_page_fault+0x54/0x90
> [  189.716508]  asm_exc_page_fault+0x22/0x30
> [  189.720535] RIP: 0033:0x5640dc2d72b5
> [  189.724131] Code: 98 48 ba 00 00 00 00 03 00 00 00 48 89 d6 48 2b 75 a0 ba 00 00 00 00 48 f7 f6 48 89 d1 48 89 ca 48 8b 45 a0 48 01 d0 48 01 d8 <0f> b6 00 bf e8 03 00 00 e8 1e fe ff ff 48 83 45 a8 01 48 8d 45 d0
> [  189.742922] RSP: 002b:00007ffc227e3f60 EFLAGS: 00010206
> [  189.748165] RAX: 00007f6e659a2ea3 RBX: 00007f6e2007e000 RCX: 0000000045924ea3
> [  189.755311] RDX: 0000000045924ea3 RSI: 0000000300000000 RDI: 00007f71202586a0
> [  189.762483] RBP: 00007ffc227e3fe0 R08: 00007f7120258074 R09: 00007f71202580a0
> [  189.769633] R10: 0000000000019458 R11: 00000000008aa400 R12: 0000000000000000
> [  189.776781] R13: 00007ffc227e4128 R14: 00007f712029d000 R15: 00005640dc2d9dd8
> [  189.783928]  </TASK>
> [  189.786126] ---[ end trace 0000000000000000 ]---
> [  285.827888] get_swap_device: Unused swap offset entry 0018403f
> [  320.699306] get_swap_device: Unused swap offset entry 000b001b
> [  354.031339] get_swap_device: Unused swap offset entry 000681a9
> [  364.958435] get_swap_device: Unused swap offset entry 001f4055
> [  364.976235] get_swap_device: Unused swap offset entry 001f4057
> [  365.530174] get_swap_device: Unused swap offset entry 000d415c
> [  394.223792] get_swap_device: Unused swap offset entry 001540d0
> [  394.317299] get_swap_device: Unused swap offset entry 000341d9
> [  394.341727] get_swap_device: Unused swap offset entry 0006c07e
> [  396.062365] get_swap_device: Unused swap offset entry 000541a4
> [  396.068262] get_swap_device: Unused swap offset entry 000541a7
> [  402.629551] get_swap_device: Unused swap offset entry 00294021
> [  436.740622] get_swap_device: Unused swap offset entry 00334155
> [  436.758527] get_swap_device: Unused swap offset entry 001b417c
> 
> swap_cluster_readahead() calls __read_swap_cache_async() on a range of
> made-up swap entries around the faulting slot. The device and the
> range (si->max) are valid, but the specific entry might not be in
> use. __read_swap_cache_async() instead relies on swapcache_prepare()
> returning -ENOENT to catch this and skip gracefully.

Sorry this got in your way, but thanks for the report! Clearly I didn't review
the call sites thoroughly enough. I'll move the extra check to
free_swap_and_cache(), retest and hopefully repost later today.

> 
> Confirmed that reverting the patch makes the warnings go away.





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux