On 07/20/2016 12:36 AM, David Rientjes wrote: > On Mon, 18 Jul 2016, Vlastimil Babka wrote: > >> After __alloc_pages_slowpath() sets up new alloc_flags and wakes up kswapd, it >> first tries get_page_from_freelist() with the new alloc_flags, as it may >> succeed e.g. due to using min watermark instead of low watermark. It makes >> sense to to do this attempt before adjusting zonelist based on >> alloc_flags/gfp_mask, as it's still relatively a fast path if we just wake up >> kswapd and successfully allocate. >> >> This patch therefore moves the initial attempt above the retry label and >> reorganizes a bit the part below the retry label. We still have to attempt >> get_page_from_freelist() on each retry, as some allocations cannot do that >> as part of direct reclaim or compaction, and yet are not allowed to fail >> (even though they do a WARN_ON_ONCE() and thus should not exist). We can reuse >> the call meant for ALLOC_NO_WATERMARKS attempt and just set alloc_flags to >> ALLOC_NO_WATERMARKS if the context allows it. As a side-effect, the attempts >> from direct reclaim/compaction will also no longer obey watermarks once this >> is set, but there's little harm in that. >> >> Kswapd wakeups are also done on each retry to be safe from potential races >> resulting in kswapd going to sleep while a process (that may not be able to >> reclaim by itself) is still looping. >> >> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> >> --- >> mm/page_alloc.c | 29 ++++++++++++++++++----------- >> 1 file changed, 18 insertions(+), 11 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index eb1968a1041e..30443804f156 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3541,35 +3541,42 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, >> */ >> alloc_flags = gfp_to_alloc_flags(gfp_mask); >> >> + if (gfp_mask & __GFP_KSWAPD_RECLAIM) >> + wake_all_kswapds(order, ac); >> + >> + /* >> + * The adjusted alloc_flags might result in immediate success, so try >> + * that first >> + */ >> + page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); >> + if (page) >> + goto got_pg; > > Any reason to not test gfp_pfmemalloc_allowed() here? For contexts where > it returns true, it seems like the above would be an unneeded failure if > ALLOC_WMARK_MIN would have failed. No strong opinion. Yeah, two reasons: 1 - less overhead (for the test) if we went to slowpath just to wake up kswapd and then succeed on min watermark 2 - try all zones with min watermark before resorting to no watermark (if allowed), so we don't needlessly put below min watermark the first zone in zonelist, while some later zone would still be above watermark > >> + >> + >> retry: >> + /* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ >> if (gfp_mask & __GFP_KSWAPD_RECLAIM) >> wake_all_kswapds(order, ac); >> >> + if (gfp_pfmemalloc_allowed(gfp_mask)) >> + alloc_flags = ALLOC_NO_WATERMARKS; >> + >> /* >> * Reset the zonelist iterators if memory policies can be ignored. >> * These allocations are high priority and system rather than user >> * orientated. >> */ >> - if (!(alloc_flags & ALLOC_CPUSET) || gfp_pfmemalloc_allowed(gfp_mask)) { >> + if (!(alloc_flags & ALLOC_CPUSET) || (alloc_flags & ALLOC_NO_WATERMARKS)) { > > Do we need to test ALLOC_NO_WATERMARKS here, or is it just for clarity? I didn't realize it's redundant, but would keep for clarity and robustness anyway. > > Otherwise looks good! Thanks! >> ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); >> ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, >> ac->high_zoneidx, ac->nodemask); >> } >> >> - /* This is the last chance, in general, before the goto nopage. */ >> + /* Attempt with potentially adjusted zonelist and alloc_flags */ >> page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac); >> if (page) >> goto got_pg; >> >> - /* Allocate without watermarks if the context allows */ >> - if (gfp_pfmemalloc_allowed(gfp_mask)) { >> - >> - page = get_page_from_freelist(gfp_mask, order, >> - ALLOC_NO_WATERMARKS, ac); >> - if (page) >> - goto got_pg; >> - } >> - >> /* Caller is not willing to reclaim, we can't balance anything */ >> if (!can_direct_reclaim) { >> /* -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>