The patch titled Subject: mm/compaction: no stuck in __reset_isolation_pfn() has been added to the -mm tree. Its filename is mm-compaction-be-selective-about-what-pageblocks-to-clear-skip-hints-fix.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-be-selective-about-what-pageblocks-to-clear-skip-hints-fix.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-be-selective-about-what-pageblocks-to-clear-skip-hints-fix.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Qian Cai <cai@xxxxxx> Subject: mm/compaction: no stuck in __reset_isolation_pfn() c68d77911c23 ("mm, compaction: be selective about what pageblocks to clear skip hints") introduced an infinite loop if a pfn is invalid, it will loop again without increasing page counters. It can be reproduced by running LTP tests on an arm64 server. # oom01 (swapping) # hugemmap01 tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s mem.c:814: INFO: set nr_hugepages to 128 Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Cannot kill test processes! Congratulation, likely test hit a kernel bug. Also, triggers soft lockups. [ 456.232228] watchdog: BUG: soft lockup - CPU#122 stuck for 22s! [kswapd0:1375] [ 456.273354] pstate: 80400009 (Nzcv daif +PAN -UAO) [ 456.278143] pc : pfn_valid+0x54/0xdc [ 456.281713] lr : __reset_isolation_pfn+0x3a8/0x584 [ 456.369358] Call trace: [ 456.371798] pfn_valid+0x54/0xdc [ 456.375019] __reset_isolation_pfn+0x3a8/0x584 [ 456.379455] __reset_isolation_suitable+0x1bc/0x280 [ 456.384325] reset_isolation_suitable+0xb8/0xe0 [ 456.388847] kswapd+0xd08/0x1048 [ 456.392067] kthread+0x2f4/0x30c [ 456.395289] ret_from_fork+0x10/0x18 Link: http://lkml.kernel.org/r/20190206034732.75687-1-cai@xxxxxx Fixes: c68d77911c23 ("mm, compaction: be selective about what pageblocks to clear skip hints") Signed-off-by: Qian Cai <cai@xxxxxx> Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: YueHaibing <yuehaibing@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- --- a/mm/compaction.c~mm-compaction-be-selective-about-what-pageblocks-to-clear-skip-hints-fix +++ a/mm/compaction.c @@ -282,17 +282,16 @@ __reset_isolation_pfn(struct zone *zone, end_page = pfn_to_page(pfn); do { - if (!pfn_valid_within(pfn)) - continue; + if (pfn_valid_within(pfn)) { + if (check_source && PageLRU(page)) { + clear_pageblock_skip(page); + return true; + } - if (check_source && PageLRU(page)) { - clear_pageblock_skip(page); - return true; - } - - if (check_target && PageBuddy(page)) { - clear_pageblock_skip(page); - return true; + if (check_target && PageBuddy(page)) { + clear_pageblock_skip(page); + return true; + } } page += (1 << PAGE_ALLOC_COSTLY_ORDER); _ Patches currently in -mm which might be from cai@xxxxxx are revert-mm-use-early_pfn_to_nid-in-page_ext_init.patch page_poison-play-nicely-with-kasan.patch slab-kmemleak-no-scan-alien-caches.patch mm-compaction-be-selective-about-what-pageblocks-to-clear-skip-hints-fix.patch signal-allow-the-null-signal-in-rt_sigqueueinfo.patch