From: Yu Zhao <yuzhao@xxxxxxxxxx> Subject: mm/vmscan.c: fix potential deadlock in reclaim_pages() Theoretically without the protect from memalloc_noreclaim_save() and memalloc_noreclaim_restore(), reclaim_pages() can go into the block I/O layer recursively and deadlock. Querying 'reclaim_pages' in our kernel crash databases didn't yield any results. So the deadlock seems unlikely to happen. A possible explanation is that the only user of reclaim_pages(), i.e., MADV_PAGEOUT, is usually called before memory pressure builds up, e.g., on Android and Chrome OS. Under such a condition, allocations in the block I/O layer can be fulfilled without diverting to direct reclaim and therefore the recursion is avoided. Link: https://lkml.kernel.org/r/20210622074642.785473-1-yuzhao@xxxxxxxxxx Link: https://lkml.kernel.org/r/20210614194727.2684053-1-yuzhao@xxxxxxxxxx Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/vmscan.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) --- a/mm/vmscan.c~mm-vmscanc-fix-potential-deadlock-in-reclaim_pages +++ a/mm/vmscan.c @@ -1701,6 +1701,7 @@ unsigned int reclaim_clean_pages_from_li unsigned int nr_reclaimed; struct page *page, *next; LIST_HEAD(clean_pages); + unsigned int noreclaim_flag; list_for_each_entry_safe(page, next, page_list, lru) { if (!PageHuge(page) && page_is_file_lru(page) && @@ -1711,8 +1712,17 @@ unsigned int reclaim_clean_pages_from_li } } + /* + * We should be safe here since we are only dealing with file pages and + * we are not kswapd and therefore cannot write dirty file pages. But + * call memalloc_noreclaim_save() anyway, just in case these conditions + * change in the future. + */ + noreclaim_flag = memalloc_noreclaim_save(); nr_reclaimed = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, &stat, true); + memalloc_noreclaim_restore(noreclaim_flag); + list_splice(&clean_pages, page_list); mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -(long)nr_reclaimed); @@ -2306,6 +2316,7 @@ unsigned long reclaim_pages(struct list_ LIST_HEAD(node_page_list); struct reclaim_stat dummy_stat; struct page *page; + unsigned int noreclaim_flag; struct scan_control sc = { .gfp_mask = GFP_KERNEL, .priority = DEF_PRIORITY, @@ -2314,6 +2325,8 @@ unsigned long reclaim_pages(struct list_ .may_swap = 1, }; + noreclaim_flag = memalloc_noreclaim_save(); + while (!list_empty(page_list)) { page = lru_to_page(page_list); if (nid == NUMA_NO_NODE) { @@ -2350,6 +2363,8 @@ unsigned long reclaim_pages(struct list_ } } + memalloc_noreclaim_restore(noreclaim_flag); + return nr_reclaimed; } _