The patch titled vmscan: stop kswapd waiting on congestion when the min watermark is not being met has been added to the -mm tree. Its filename is vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find out what to do about this The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/ ------------------------------------------------------ Subject: vmscan: stop kswapd waiting on congestion when the min watermark is not being met From: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel@xxxxxxxxx: Add stats to track how relevant the logic is] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> Signed-off-by: Mel Gorman <mel@xxxxxxxxx> Reviewed-by: Rik van Riel <riel@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/linux/vmstat.h | 1 + mm/vmscan.c | 18 ++++++++++++++++-- mm/vmstat.c | 1 + 3 files changed, 18 insertions(+), 2 deletions(-) diff -puN include/linux/vmstat.h~vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2 include/linux/vmstat.h --- a/include/linux/vmstat.h~vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2 +++ a/include/linux/vmstat.h @@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS #endif PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW, + KSWAPD_SKIP_CONGESTION_WAIT, PAGEOUTRUN, ALLOCSTALL, PGROTATED, #ifdef CONFIG_HUGETLB_PAGE HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, diff -puN mm/vmscan.c~vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2 mm/vmscan.c --- a/mm/vmscan.c~vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2 +++ a/mm/vmscan.c @@ -1983,6 +1983,7 @@ loop_again: for (priority = DEF_PRIORITY; priority >= 0; priority--) { int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ unsigned long lru_pages = 0; + int has_under_min_watermark_zone = 0; /* The swap token gets in the way of swapout... */ if (!priority) @@ -2089,6 +2090,15 @@ loop_again: if (total_scanned > SWAP_CLUSTER_MAX * 2 && total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2) sc.may_writepage = 1; + + /* + * We are still under min water mark. it mean we have + * GFP_ATOMIC allocation failure risk. Hurry up! + */ + if (!zone_watermark_ok(zone, order, min_wmark_pages(zone), + end_zone, 0)) + has_under_min_watermark_zone = 1; + } if (all_zones_ok) break; /* kswapd: all done */ @@ -2096,8 +2106,12 @@ loop_again: * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ - if (total_scanned && priority < DEF_PRIORITY - 2) - congestion_wait(BLK_RW_ASYNC, HZ/10); + if (total_scanned && (priority < DEF_PRIORITY - 2)) { + if (has_under_min_watermark_zone) + count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT); + else + congestion_wait(BLK_RW_ASYNC, HZ/10); + } /* * We do this so kswapd doesn't build up large priorities for diff -puN mm/vmstat.c~vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2 mm/vmstat.c --- a/mm/vmstat.c~vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2 +++ a/mm/vmstat.c @@ -685,6 +685,7 @@ static const char * const vmstat_text[] "kswapd_inodesteal", "kswapd_slept_prematurely_fast", "kswapd_slept_prematurely_slow", + "kswapd_skip_congestion_wait", "pageoutrun", "allocstall", _ Patches currently in -mm which might be from kosaki.motohiro@xxxxxxxxxxxxxx are linux-next.patch oom-dump-stack-and-vm-state-when-oom-killer-panics.patch readahead-add-blk_run_backing_dev.patch mmap-dont-return-enomem-when-mapcount-is-temporarily-exceeded-in-munmap.patch mmap-dont-return-enomem-when-mapcount-is-temporarily-exceeded-in-munmap-checkpatch-fixes.patch mm-vsmcan-check-shrink_active_list-sc-isolate_pages-return-value.patch mm-move-inc_zone_page_statenr_isolated-to-just-isolated-place.patch rmap-simplify-try_to_unmap_file.patch oom_kill-use-rss-value-instead-of-vm-size-for-badness.patch oom-kill-show-virtual-size-and-rss-information-of-the-killed-process.patch oom-kill-show-virtual-size-and-rss-information-of-the-killed-process-fix.patch oom-kill-fix-numa-consraint-check-with-nodemask-v42.patch oom-kill-fix-numa-consraint-check-with-nodemask-v42-checkpatch-fixes.patch page-allocator-wait-on-both-sync-and-async-congestion-after-direct-reclaim.patch vmscan-have-kswapd-sleep-for-a-short-interval-and-double-check-it-should-be-asleep.patch vmscan-stop-kswapd-waiting-on-congestion-when-the-min-watermark-is-not-being-met-v2.patch mm-define-page_mapping_flags.patch mm-mlocking-in-try_to_unmap_one.patch mm-config_mmu-for-pg_mlocked.patch mm-pass-address-down-to-rmap-ones.patch mm-stop-ptlock-enlarging-struct-page.patch mm-sigbus-instead-of-abusing-oom.patch mm-add-numa-node-symlink-for-memory-section-in-sysfs.patch mm-refactor-register_cpu_under_node.patch mm-refactor-unregister_cpu_under_node.patch mm-add-numa-node-symlink-for-cpu-devices-in-sysfs.patch documentation-abi-sys-devices-system-cpu-cpu-node.patch vmscan-separate-scswap_cluster_max-and-scnr_max_reclaim.patch vmscan-kill-hibernation-specific-reclaim-logic-and-unify-it.patch vmscan-zone_reclaim-dont-use-insane-swap_cluster_max.patch vmscan-kill-scswap_cluster_max.patch vmscan-make-consistent-of-reclaim-bale-out-between-do_try_to_free_page-and-shrink_zone.patch lib-introduce-strim.patch fs-symlink-write_begin-allocation-context-fix-reiser4-fix.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html