[RFC PATCH] mm/swap: fix system stuck due to infinite loop

Stillinux <stillinux@xxxxxxxxx> · Fri, 2 Apr 2021 15:03:37 +0800

In the case of high system memory and load pressure, we ran ltp test
and found that the system was stuck, the direct memory reclaim was
all stuck in io_schedule, the waiting request was stuck in the blk_plug
flow of one process, and this process fell into an infinite loop.
not do the action of brushing out the request.

The call flow of this process is swap_cluster_readahead.
Use blk_start/finish_plug for blk_plug operation,
flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare.
When swapcache_prepare return -EEXIST, it will fall into an infinite loop,
even if cond_resched is called, but according to the schedule,
sched_submit_work will be based on tsk->state, and will not flash out
the blk_plug request, so will hang io, causing the overall system  hang.

For the first time involving the swap part, there is no good way to fix
the problem from the fundamental problem. In order to solve the
engineering situation, we chose to make swap_cluster_readahead aware of
the memory pressure situation as soon as possible, and do io_schedule to
flush out the blk_plug request, thereby changing the allocation flag in
swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io.
Although system operating normally, but not the most fundamental way.

Signed-off-by: huangjinhui <huangjinhui@xxxxxxxxxx>
---
 mm/page_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index c493ce9ebcf5..87392ffabb12 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous)
 	}
 
 	ret = 0;
-	bio = bio_alloc(GFP_KERNEL, 1);
+	bio = bio_alloc(GFP_NOIO, 1);
 	bio_set_dev(bio, sis->bdev);
 	bio->bi_opf = REQ_OP_READ;
 	bio->bi_iter.bi_sector = swap_page_sector(page);
-- 
2.25.1