The recently introduced PG_dropbehind allows for freeing folios immediately after writeback. Unlike PG_reclaim, it does not need vmscan to be involved to get the folio freed. Instead of using folio_set_reclaim(), use folio_set_dropbehind() in shrink_folio_list(). It is safe to leave PG_dropbehind on the folio if, for some reason (bug?), the folio is not in a writeback state after ->writepage(). In these cases, the kernel had to clear PG_reclaim as it shared a page flag bit with PG_readahead. Also use PG_dropbehind instead PG_reclaim to detect I/O congestion. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Acked-by: David Hildenbrand <david@xxxxxxxxxx> --- mm/vmscan.c | 30 ++++++++---------------------- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index d15f80333d6b..bb5ec22f97b5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1140,7 +1140,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * for immediate reclaim are making it to the end of * the LRU a second time. */ - if (writeback && folio_test_reclaim(folio)) + if (writeback && folio_test_dropbehind(folio)) stat->nr_congested += nr_pages; /* @@ -1149,7 +1149,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * * 1) If reclaim is encountering an excessive number * of folios under writeback and this folio has both - * the writeback and reclaim flags set, then it + * the writeback and dropbehind flags set, then it * indicates that folios are being queued for I/O but * are being recycled through the LRU before the I/O * can complete. Waiting on the folio itself risks an @@ -1174,7 +1174,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, * would probably show more reasons. * * 3) Legacy memcg encounters a folio that already has the - * reclaim flag set. memcg does not have any dirty folio + * dropbehind flag set. memcg does not have any dirty folio * throttling so we could easily OOM just because too many * folios are in writeback and there is nothing else to * reclaim. Wait for the writeback to complete. @@ -1193,31 +1193,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, /* Case 1 above */ if (current_is_kswapd() && - folio_test_reclaim(folio) && + folio_test_dropbehind(folio) && test_bit(PGDAT_WRITEBACK, &pgdat->flags)) { stat->nr_immediate += nr_pages; goto activate_locked; /* Case 2 above */ } else if (writeback_throttling_sane(sc) || - !folio_test_reclaim(folio) || + !folio_test_dropbehind(folio) || !may_enter_fs(folio, sc->gfp_mask) || (mapping && mapping_writeback_indeterminate(mapping))) { - /* - * This is slightly racy - - * folio_end_writeback() might have - * just cleared the reclaim flag, then - * setting the reclaim flag here ends up - * interpreted as the readahead flag - but - * that does not matter enough to care. - * What we do want is for this folio to - * have the reclaim flag set next time - * memcg reclaim reaches the tests above, - * so it will then wait for writeback to - * avoid OOM; and it's also appropriate - * in global reclaim. - */ - folio_set_reclaim(folio); + folio_set_dropbehind(folio); stat->nr_writeback += nr_pages; goto activate_locked; @@ -1372,7 +1358,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ if (folio_is_file_lru(folio) && (!current_is_kswapd() || - !folio_test_reclaim(folio) || + !folio_test_dropbehind(folio) || !test_bit(PGDAT_DIRTY, &pgdat->flags))) { /* * Immediately reclaim when written back. @@ -1382,7 +1368,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, */ node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE, nr_pages); - folio_set_reclaim(folio); + folio_set_dropbehind(folio); goto activate_locked; } -- 2.45.2