+ mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
has been added to the -mm tree.  Its filename is
     mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Vlastimil Babka <vbabka@xxxxxxx>
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset

In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies.  The zonelist is obtained
from current CPU's node.  This is a problem for __GFP_THISNODE allocations
that want to allocate on a different node, e.g.  because the allocating
thread has been migrated to a different CPU.

This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended.  If a slab page is
put on wrong node's list, then further list manipulations may corrupt the
list because page_to_nid() is used to determine which node's list_lock
should be locked and thus we may take a wrong lock and race.

Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.

We can fix it by simply removing the zonelist reset completely.  There is
actually no reason to reset it, because memory policies and cpusets don't
affect the zonelist choice in the first place.  This was different when
commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.

We might consider this for 4.17 although I don't know if there's anything
currently broken.  Stable backports should be more important, but will
have to be reviewed carefully, as the code went through many changes.  BTW
I think that also the ac->preferred_zoneref reset is currently useless if
we don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.

Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@xxxxxxx
Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Vlastimil Babka <vbabka@xxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/page_alloc.c |    1 -
 1 file changed, 1 deletion(-)

diff -puN mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset
+++ a/mm/page_alloc.c
@@ -4169,7 +4169,6 @@ retry:
 	 * orientated.
 	 */
 	if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
-		ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
 		ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
 					ac->high_zoneidx, ac->nodemask);
 	}
_

Patches currently in -mm which might be from vbabka@xxxxxxx are

mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux