In the case of high system memory and load pressure, we ran ltp test and found that the system was stuck, the direct memory reclaim was all stuck in io_schedule, the waiting request was stuck in the blk_plug flow of one process, and this process fell into an infinite loop. not do the action of brushing out the request. The call flow of this process is swap_cluster_readahead. Use blk_start/finish_plug for blk_plug operation, flow swap_cluster_readahead->__read_swap_cache_async->swapcache_prepare. When swapcache_prepare return -EEXIST, it will fall into an infinite loop, even if cond_resched is called, but according to the schedule, sched_submit_work will be based on tsk->state, and will not flash out the blk_plug request, so will hang io, causing the overall system hang. For the first time involving the swap part, there is no good way to fix the problem from the fundamental problem. In order to solve the engineering situation, we chose to make swap_cluster_readahead aware of the memory pressure situation as soon as possible, and do io_schedule to flush out the blk_plug request, thereby changing the allocation flag in swap_readpage to GFP_NOIO , No longer do the memory reclaim of flush io. Although system operating normally, but not the most fundamental way. Signed-off-by: huangjinhui <huangjinhui@xxxxxxxxxx> --- mm/page_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_io.c b/mm/page_io.c index c493ce9ebcf5..87392ffabb12 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -403,7 +403,7 @@ int swap_readpage(struct page *page, bool synchronous) } ret = 0; - bio = bio_alloc(GFP_KERNEL, 1); + bio = bio_alloc(GFP_NOIO, 1); bio_set_dev(bio, sis->bdev); bio->bi_opf = REQ_OP_READ; bio->bi_iter.bi_sector = swap_page_sector(page); -- 2.25.1