On Mon, 15 Dec 2014, Andrew Morton wrote: > Well it was already wrong because the first allocation attempt uses > gfp_mask|__GFP_HARDWAL, but we only trace gfp_mask. > > This? > > --- a/mm/page_alloc.c~mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask-fix > +++ a/mm/page_alloc.c > @@ -2877,6 +2877,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, u > unsigned int cpuset_mems_cookie; > int alloc_flags = ALLOC_WMARK_LOW|ALLOC_CPUSET|ALLOC_FAIR; > int classzone_idx; > + gfp_t mask; > > gfp_mask &= gfp_allowed_mask; > > @@ -2910,23 +2911,24 @@ retry_cpuset: > classzone_idx = zonelist_zone_idx(preferred_zoneref); > > /* First allocation attempt */ > - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order, > - zonelist, high_zoneidx, alloc_flags, > - preferred_zone, classzone_idx, migratetype); > + mask = gfp_mask|__GFP_HARDWALL; > + page = get_page_from_freelist(mask, nodemask, order, zonelist, > + high_zoneidx, alloc_flags, preferred_zone, > + classzone_idx, migratetype); > if (unlikely(!page)) { > /* > * Runtime PM, block IO and its error handling path > * can deadlock because I/O on the device might not > * complete. > */ > - gfp_t mask = memalloc_noio_flags(gfp_mask); > + mask = memalloc_noio_flags(gfp_mask); > > page = __alloc_pages_slowpath(mask, order, > zonelist, high_zoneidx, nodemask, > preferred_zone, classzone_idx, migratetype); > } > > - trace_mm_page_alloc(page, order, gfp_mask, migratetype); > + trace_mm_page_alloc(page, order, mask, migratetype); > > out: > /* I'm not sure I understand why we need a local variable to hold the context mask vs. what was passed to the function. We should only be allocating with a single gfp_mask that is passed to the function and modify it as necessary, and that becomes the context mask that can be traced. The above is wrong because it unconditionally sets __GFP_HARDWALL as the gfp mask for __alloc_pages_slowpath() when we actually only want that for the first allocation attempt, it's needed for the implementation of __cpuset_node_allowed(). The page allocator slowpath is always called from the fastpath if the first allocation didn't succeed, so we don't know from which we allocated the page at this tracepoint. I'm afraid the original code before either of these patches was more correct. The use of memalloc_noio_flags() for "subsequent allocation attempts" doesn't really matter since neither __GFP_FS nor __GFP_IO matters for fastpath allocation (we aren't reclaiming). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>