Re: [PATCH 2/2] mm/vmscan: don't try to reclaim hwpoison folio

Jinjiang Tu <tujinjiang@xxxxxxxxxx> · Thu, 20 Mar 2025 11:37:50 +0800

在 2025/3/20 10:50, Miaohe Lin 写道:
On 2025/3/18 16:39, Jinjiang Tu wrote:
Syzkaller reports a bug as follows:
Thanks for your fix.

Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000
Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users
Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed
page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e
memcg:ffff0000dd6d9000
anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff)
raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9
raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000
page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio))
------------[ cut here ]------------
kernel BUG at mm/swap_state.c:184!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Modules linked in:
CPU: 0 PID: 60 Comm: kswapd0 Not tainted 6.6.0-gcb097e7de84e #3
Hardware name: linux,dummy-virt (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : add_to_swap+0xbc/0x158
lr : add_to_swap+0xbc/0x158
sp : ffff800087f37340
x29: ffff800087f37340 x28: fffffc00052c0380 x27: ffff800087f37780
x26: ffff800087f37490 x25: ffff800087f37c78 x24: ffff800087f377a0
x23: ffff800087f37c50 x22: 0000000000000000 x21: fffffc00052c03b4
x20: 0000000000000000 x19: fffffc00052c0380 x18: 0000000000000000
x17: 296f696c6f662865 x16: 7461646f7470755f x15: 747365745f6f696c
x14: 6f6621284f494c4f x13: 0000000000000001 x12: ffff600036d8b97b
x11: 1fffe00036d8b97a x10: ffff600036d8b97a x9 : dfff800000000000
x8 : 00009fffc9274686 x7 : ffff0001b6c5cbd3 x6 : 0000000000000001
x5 : ffff0000c25896c0 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : ffff0000c25896c0 x0 : 0000000000000000
Call trace:
  add_to_swap+0xbc/0x158
  shrink_folio_list+0x12ac/0x2648
  shrink_inactive_list+0x318/0x948
  shrink_lruvec+0x450/0x720
  shrink_node_memcgs+0x280/0x4a8
  shrink_node+0x128/0x978
  balance_pgdat+0x4f0/0xb20
  kswapd+0x228/0x438
  kthread+0x214/0x230
  ret_from_fork+0x10/0x20

There are too many races in memory_failure to handle...

I can reproduce this issue with the following steps:
1) When a dirty swapcache page is isolated by reclaim process and the page
isn't locked, inject memory failure for the page. me_swapcache_dirty()
clears uptodate flag and tries to delete from lru, but fails. Reclaim
process will put the hwpoisoned page back to lru.
The hwpoisoned page is put back to lru list due to memory_failure holding the extra page refcnt?

Yes

2) The process that maps the hwpoisoned page exits, the page is deleted
the page will never be freed and will be in the lru forever.
Again, memory_failure holds the extra page refcnt so...

3) If we trigger a reclaim again and tries to reclaim the page,
add_to_swap() will trigger VM_BUG_ON_FOLIO due to the uptodate flag is
cleared.

To fix it, skip the hwpoisoned page in shrink_folio_list(). Besides, the
hwpoison folio may not be unmapped by hwpoison_user_mappings() yet, unmap
it in shrink_folio_list(), otherwise the folio will fail to be unmaped
by hwpoison_user_mappings() since the folio isn't in lru list.

Signed-off-by: Jinjiang Tu <tujinjiang@xxxxxxxxxx>
Acked-by: Miaohe Lin <linmiaohe@xxxxxxxxxx>

Thanks for your review.

Thanks.
.