Re: [PATCH] mm: fix unevictable page reclaim when calling madvise_pageout

David Hildenbrand <david@xxxxxxxxxx> · Mon, 28 Oct 2019 17:15:57 +0100

On 28.10.19 17:07, David Hildenbrand wrote:
On 28.10.19 16:45, zhong jiang wrote:
On 2019/10/28 23:27, David Hildenbrand wrote:
On 28.10.19 16:08, zhong jiang wrote:
Recently, I hit the following issue when running in the upstream.

kernel BUG at mm/vmscan.c:1521!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521
Code: de f5 ff ff e8 ab 79 eb ff 4c 89 f7 e8 43 33 0d 00 e9 cc f5 ff ff e8 99 79 eb ff 48 c7 c6 a0 34 2b a0 4c 89 f7 e8 1a 4d 05 00 <0f> 0b e8 83 79 eb ff 48 89 d8 48 c1 e8 03 42 80 3c 38 00 0f 85 74
RSP: 0018:ffff88819a3df5a0 EFLAGS: 00010286
RAX: 0000000000040000 RBX: ffffea00061c3980 RCX: ffffffff814fba36
RDX: 00000000000056f7 RSI: ffffc9000c02c000 RDI: ffff8881f70268cc
RBP: ffff88819a3df898 R08: ffffed103ee05de0 R09: ffffed103ee05de0
R10: 0000000000000001 R11: ffffed103ee05ddf R12: ffff88819a3df6f0
R13: ffff88819a3df6f0 R14: ffffea00061c3980 R15: dffffc0000000000
FS:  00007f21b9d8e700(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2d621000 CR3: 00000001c8c46004 CR4: 00000000007606f0
DR0: 0000000020000140 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
PKRU: 55555554
Call Trace:
    reclaim_pages+0x499/0x800 mm/vmscan.c:2188
    madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453
    walk_pmd_range mm/pagewalk.c:53 [inline]
    walk_pud_range mm/pagewalk.c:112 [inline]
    walk_p4d_range mm/pagewalk.c:139 [inline]
    walk_pgd_range mm/pagewalk.c:166 [inline]
    __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261
    walk_page_range+0x179/0x310 mm/pagewalk.c:349
    madvise_pageout_page_range mm/madvise.c:506 [inline]
    madvise_pageout+0x1f0/0x330 mm/madvise.c:542
    madvise_vma mm/madvise.c:931 [inline]
    __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113
    do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

madvise_pageout access the specified range of the vma and isolate
them, then run shrink_page_list to reclaim the memory. But It also
isolate the unevictable page to reclaim. Hence, we can catch the
cases in shrink_page_list.

We can fix it by preventing unevictable page from isolating.
Another way to fix the issue by removing the condition of
BUG_ON(PageUnevictable(page)) in shrink_page_list. I think it
is better  to use the latter. Because We has taken the unevictable
page and skip it into account in shrink_page_list.
I really don't understand the last sentence. Looks like
something got messed up :)
I mean that we will check the page_evictable(page) in shrink_page_list,
if it is unevictable page, we will put the page back to correct lru.

Based on the condition, I make the choice. It seems to more simpler.:-)

Thanks,
zhong jiang

Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx>
---
    mm/vmscan.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f7d1301..1c6e959 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1524,7 +1524,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
    		unlock_page(page);
    keep:
    		list_add(&page->lru, &ret_pages);
-		VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page);
+		VM_BUG_ON_PAGE(PageLRU(page), page);
So, this comes from

commit b291f000393f5a0b679012b39d79fbc85c018233
Author: Nick Piggin <npiggin@xxxxxxx>
Date:   Sat Oct 18 20:26:44 2008 -0700

      mlock: mlocked pages are unevictable
      
      Make sure that mlocked pages also live on the unevictable LRU, so kswapd
      will not scan them over and over again.


That patch is fairly old. How come we can suddenly trigger this?
Which commit is responsible for that? Was it always broken?

I can see that

commit ad6b67041a45497261617d7a28b15159b202cb5a
Author: Minchan Kim <minchan@xxxxxxxxxx>
Date:   Wed May 3 14:54:13 2017 -0700

      mm: remove SWAP_MLOCK in ttu

Performed some changes in that area. But also some time ago.
I think the following patch introduce the issue.

commit 1a4e58cce84ee88129d5d49c064bd2852b481357
Author: Minchan Kim <minchan@xxxxxxxxxx>
Date:   Wed Sep 25 16:49:15 2019 -0700

      mm: introduce MADV_PAGEOUT

      When a process expects no accesses to a certain memory range for a long


CCing Minchan Kim then.

If this is indeed the introducing patch, you probably reference that
patch in your cover mail somehow. (Fixes: does not apply until upstream)

I am absolutely no expert on vmscan.c, so I'm afraid I can't really
comment on the details.


Oh, and just wondering, is this the same BUG as in

https://lkml.org/lkml/2019/8/2/1506

Where a fix has been proposed? The fix does not seem to be in 
next/master yet.

(I just realized that it is already upstream so "Fixes: 	1a4e58cce84e 
("mm: introduce MADV_PAGEOUT")) applies.



--

Thanks,

David / dhildenb