Re: [patch 2/6] mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Thu, 18 Dec 2014 13:10:41 -0800

On Wed, 17 Dec 2014 16:51:21 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> > > The page allocator slowpath is always called from the fastpath if the 
> > > first allocation didn't succeed, so we don't know from which we allocated 
> > > the page at this tracepoint.
> > 
> > True, but the idea is that when we call trace_mm_page_alloc(), local
> > var `mask' holds the gfp_t which was used in the most recent allocation
> > attempt.
> > 
> 
> So if the fastpath succeeds, which should be the majority of the time, 
> then we get a tracepoint here that says we allocated with 
> __GFP_FS | __GFP_IO even though we may have PF_MEMALLOC_NOIO set.  So if 
> page != NULL, we can know that either the fastpath succeeded or we don't 
> have PF_MEMALLOC_NOIO and were allowed to reclaim.  Not sure that's very 
> helpful.
> 
> Easiest thing to do would be to just clear __GFP_FS and __GFP_IO when we 
> clear everything not in gfp_allowed_mask, but that's pointless if the 
> fastpath succeeds.  I'm not sure it's worth to restructure the code with a 
> possible performance overhead for the benefit of a tracepoint.
> 
> And then there's the call to lockdep_trace_alloc() which does care about 
> __GFP_FS.  That looks broken because we need to clear __GFP_FS with 
> PF_MEMALLOC_NOIO.

<head spinning>

I'm not particuarly concerned about the tracepoint and we can change
that later.  The main intent here is to restore the allocation mask
when __alloc_pages_nodemask() does the "goto retry_cpuset".

(I renamed `mask' to `alloc_mask' and documented it a bit)



From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Subject: mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask

__alloc_pages_nodemask() strips __GFP_IO when retrying the page
allocation.  But it does this by altering the function-wide variable
gfp_mask.  This will cause subsequent allocation attempts to inadvertently
use the modified gfp_mask.

Also, pass the correct mask (the mask we actually used) into
trace_mm_page_alloc().

Cc: Ming Lei <ming.lei@xxxxxxxxxxxxx>
Cc: Mel Gorman <mel@xxxxxxxxx>
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@xxxxxxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/page_alloc.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff -puN mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask mm/page_alloc.c

--- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask
+++ a/mm/page_alloc.c
@@ -2865,6 +2865,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u
 	unsigned int cpuset_mems_cookie;
 	int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR;
 	int classzone_idx;
+	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 
 	gfp_mask &= gfp_allowed_mask;
 
@@ -2898,22 +2899,24 @@ retry_cpuset:
 	classzone_idx = zonelist_zone_idx(preferred_zoneref);
 
 	/* First allocation attempt */
-	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order,
-			zonelist, high_zoneidx, alloc_flags,
-			preferred_zone, classzone_idx, migratetype);
+	alloc_mask = gfp_mask|__GFP_HARDWALL;
+	page = get_page_from_freelist(alloc_mask, nodemask, order, zonelist,
+			high_zoneidx, alloc_flags, preferred_zone,
+			classzone_idx, migratetype);
 	if (unlikely(!page)) {
 		/*
 		 * Runtime PM, block IO and its error handling path
 		 * can deadlock because I/O on the device might not
 		 * complete.
 		 */
-		gfp_mask = memalloc_noio_flags(gfp_mask);
-		page = __alloc_pages_slowpath(gfp_mask, order,
+		alloc_mask = memalloc_noio_flags(gfp_mask);
+
+		page = __alloc_pages_slowpath(alloc_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, classzone_idx, migratetype);
 	}
 
-	trace_mm_page_alloc(page, order, gfp_mask, migratetype);
+	trace_mm_page_alloc(page, order, alloc_mask, migratetype);
 
 out:
 	/*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>