On Wed, Mar 03, 2021 at 08:48:58AM -0800, Dave Hansen wrote: > On 3/3/21 8:31 AM, Ben Widawsky wrote: > >> I haven't got to the whole series yet. The real question is whether the > >> first attempt to enforce the preferred mask is a general win. I would > >> argue that it resembles the existing single node preferred memory policy > >> because that one doesn't push heavily on the preferred node either. So > >> dropping just the direct reclaim mode makes some sense to me. > >> > >> IIRC this is something I was recommending in an early proposal of the > >> feature. > > My assumption [FWIW] is that the usecases we've outlined for multi-preferred > > would want more heavy pushing on the preference mask. However, maybe the uapi > > could dictate how hard to try/not try. > > There are two things that I think are important: > > 1. MPOL_PREFERRED_MANY fallback away from the preferred nodes should be > *temporary*, even in the face of the preferred set being full. That > means that _some_ reclaim needs to be done. Kicking off kswapd is > fine for this. > 2. MPOL_PREFERRED_MANY behavior should resemble MPOL_PREFERRED as > closely as possible. We're just going to confuse users if they set a > single node in a MPOL_PREFERRED_MANY mask and get different behavior > from MPOL_PREFERRED. > > While it would be nice, short-term, to steer MPOL_PREFERRED_MANY > behavior toward how we expect it to get used first, I think it's a > mistake if we do it at the cost of long-term divergence from MPOL_PREFERRED. Hi All, Based on the discussion, I update the patch as below, please review, thanks >From ea9e32fa8b6eff4a64d790b856e044adb30f04b5 Mon Sep 17 00:00:00 2001 From: Feng Tang <feng.tang@xxxxxxxxx> Date: Wed, 10 Mar 2021 12:31:24 +0800 Subject: [PATCH] mm/mempolicy: speedup page alloc for MPOL_PREFERRED_MANY When doing broader test, we noticed allocation slowness in one test case that malloc memory with size which is slightly bigger than free memory of targeted nodes, but much less then the total free memory of system. The reason is the code enters the slowpath of __alloc_pages_nodemask(), which takes quite some time. Since alloc_pages_policy() will give it a 2nd try with NULL nodemask, we tried solution which creates a new gfp_mask bit __GFP_NO_SLOWPATH for explicitely skipping entering slowpath in the first try, which is brutal and costs one precious gfp mask bit. Based on discussion with Michal/Ben/Dave [1], only skip entering direct reclaim while still allowing it to wakeup kswapd, which can fix the slowness and make MPOL_PREFERRED_MANY more close to the semantic of MPOL_PREFERRED, while avoid creating a new gfp bit. [1]. https://lore.kernel.org/lkml/1614766858-90344-15-git-send-email-feng.tang@xxxxxxxxx/ Suggested-by: Michal Hocko <mhocko@xxxxxxxx> Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx> --- mm/mempolicy.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d66c1c0..00b19f7 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2205,9 +2205,13 @@ static struct page *alloc_pages_policy(struct mempolicy *pol, gfp_t gfp, * | MPOL_PREFERRED_MANY (round 2) | local | NULL | * +-------------------------------+---------------+------------+ */ - if (pol->mode == MPOL_PREFERRED_MANY) + if (pol->mode == MPOL_PREFERRED_MANY) { gfp_mask |= __GFP_RETRY_MAYFAIL | __GFP_NOWARN; + /* Skip direct reclaim, as there will be a second try */ + gfp_mask &= ~__GFP_DIRECT_RECLAIM; + } + page = __alloc_pages_nodemask(gfp_mask, order, policy_node(gfp, pol, preferred_nid), policy_nodemask(gfp, pol)); -- 2.7.4