On Thu, Apr 06, 2023 at 10:04:16PM +0800, Aaron Lu wrote: > On Tue, Apr 04, 2023 at 11:47:16PM +0800, Rongwei Wang wrote: > > The si->lock must be held when deleting the si from > > the available list. Otherwise, another thread can > > re-add the si to the available list, which can lead > > to memory corruption. The only place we have found > > where this happens is in the swapoff path. This case > > can be described as below: > > > > core 0 core 1 > > swapoff > > > > del_from_avail_list(si) waiting > > > > try lock si->lock acquire swap_avail_lock > > and re-add si into > > swap_avail_head > > confused here. > > If del_from_avail_list(si) finished in swaoff path, then this si should > not exist in any of the per-node avail list and core 1 should not be > able to re-add it. I think a possible sequence could be like this: cpuX cpuY swapoff put_swap_folio() del_from_avail_list(si) taken si->lock spin_lock(&si->lock); swap_range_free() was_full && SWP_WRITEOK -> re-add! drop si->lock taken si->lock proceed removing si End result: si left on avail_list after being swapped off. The problem is, in add_to_avail_list(), it has no idea this si is being swapped off and taking si->lock then del_from_avail_list() could avoid this problem, so I think this patch did the right thing but the changelog about how this happened needs updating and after that: Reviewed-by: Aaron Lu <aaron.lu@xxxxxxxxx> Thanks, Aaron > > I stared at the code for a while and couldn't figure out how this > happened, will continue to look at this tomorrow. > > > > acquire si->lock but > > missing si already be > > added again, and continuing > > to clear SWP_WRITEOK, etc. > > > > It can be easily found a massive warning messages can > > be triggered inside get_swap_pages() by some special > > cases, for example, we call madvise(MADV_PAGEOUT) on > > blocks of touched memory concurrently, meanwhile, run > > much swapon-swapoff operations (e.g. stress-ng-swap). > > > > However, in the worst case, panic can be caused by the > > above scene. In swapoff(), the memory used by si could > > be kept in swap_info[] after turning off a swap. This > > means memory corruption will not be caused immediately > > until allocated and reset for a new swap in the swapon > > path. A panic message caused: > > (with CONFIG_PLIST_DEBUG enabled) > > > > ------------[ cut here ]------------ > > top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a > > prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d > > next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a > > WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70 > > Modules linked in: rfkill(E) crct10dif_ce(E)... > > CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+ > > Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) > > pc : plist_check_prev_next_node+0x50/0x70 > > lr : plist_check_prev_next_node+0x50/0x70 > > sp : ffff0018009d3c30 > > x29: ffff0018009d3c40 x28: ffff800011b32a98 > > x27: 0000000000000000 x26: ffff001803908000 > > x25: ffff8000128ea088 x24: ffff800011b32a48 > > x23: 0000000000000028 x22: ffff001800875c00 > > x21: ffff800010f9e520 x20: ffff001800875c00 > > x19: ffff001800fdc6e0 x18: 0000000000000030 > > x17: 0000000000000000 x16: 0000000000000000 > > x15: 0736076307640766 x14: 0730073007380731 > > x13: 0736076307640766 x12: 0730073007380731 > > x11: 000000000004058d x10: 0000000085a85b76 > > x9 : ffff8000101436e4 x8 : ffff800011c8ce08 > > x7 : 0000000000000000 x6 : 0000000000000001 > > x5 : ffff0017df9ed338 x4 : 0000000000000001 > > x3 : ffff8017ce62a000 x2 : ffff0017df9ed340 > > x1 : 0000000000000000 x0 : 0000000000000000 > > Call trace: > > plist_check_prev_next_node+0x50/0x70 > > plist_check_head+0x80/0xf0 > > plist_add+0x28/0x140 > > add_to_avail_list+0x9c/0xf0 > > _enable_swap_info+0x78/0xb4 > > __do_sys_swapon+0x918/0xa10 > > __arm64_sys_swapon+0x20/0x30 > > el0_svc_common+0x8c/0x220 > > do_el0_svc+0x2c/0x90 > > el0_svc+0x1c/0x30 > > el0_sync_handler+0xa8/0xb0 > > el0_sync+0x148/0x180 > > irq event stamp: 2082270 > > > > Now, si->lock locked before calling 'del_from_avail_list()' > > to make sure other thread see the si had been deleted > > and SWP_WRITEOK cleared together, will not reinsert again. > > > > This problem exists in versions after stable 5.10.y. > > > > Cc: stable@xxxxxxxxxxxxxxx > > Tested-by: Yongchen Yin <wb-yyc939293@xxxxxxxxxxxxxxx> > > Signed-off-by: Rongwei Wang <rongwei.wang@xxxxxxxxxxxxxxxxx> > > --- > > mm/swapfile.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 62ba2bf577d7..2c718f45745f 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -679,6 +679,7 @@ static void __del_from_avail_list(struct swap_info_struct *p) > > { > > int nid; > > > > + assert_spin_locked(&p->lock); > > for_each_node(nid) > > plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]); > > } > > @@ -2434,8 +2435,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) > > spin_unlock(&swap_lock); > > goto out_dput; > > } > > - del_from_avail_list(p); > > spin_lock(&p->lock); > > + del_from_avail_list(p); > > if (p->prio < 0) { > > struct swap_info_struct *si = p; > > int nid; > > -- > > 2.27.0 > >