On 02/14/2017 01:58 PM, Vlastimil Babka wrote: > On 02/10/2017 11:06 AM, Anshuman Khandual wrote: >> This implements allocation isolation for CDM nodes in buddy allocator by >> discarding CDM memory zones all the time except in the cases where the gfp >> flag has got __GFP_THISNODE or the nodemask contains CDM nodes in cases >> where it is non NULL (explicit allocation request in the kernel or user >> process MPOL_BIND policy based requests). >> >> Signed-off-by: Anshuman Khandual <khandual@xxxxxxxxxxxxxxxxxx> >> --- >> mm/page_alloc.c | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 84d61bb..392c24a 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -64,6 +64,7 @@ >> #include <linux/page_owner.h> >> #include <linux/kthread.h> >> #include <linux/memcontrol.h> >> +#include <linux/node.h> >> >> #include <asm/sections.h> >> #include <asm/tlbflush.h> >> @@ -2908,6 +2909,21 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, >> struct page *page; >> unsigned long mark; >> >> + /* >> + * CDM nodes get skipped if the requested gfp flag >> + * does not have __GFP_THISNODE set or the nodemask >> + * does not have any CDM nodes in case the nodemask >> + * is non NULL (explicit allocation requests from >> + * kernel or user process MPOL_BIND policy which has >> + * CDM nodes). >> + */ >> + if (is_cdm_node(zone->zone_pgdat->node_id)) { >> + if (!(gfp_mask & __GFP_THISNODE)) { >> + if (!ac->nodemask) >> + continue; >> + } >> + } > > With the current cpuset implementation, this will have a subtle corner > case when allocating from a cpuset that allows the cdm node, and there > is no (task or vma) mempolicy applied for the allocation. In the fast > path (__alloc_pages_nodemask()) we'll set ac->nodemask to > current->mems_allowed, so your code will wrongly assume that this > ac->nodemask is a policy that allows the CDM node. Probably not what you > want? You are right, its a problem and not what we want. We can make the function get_page_from_freelist() take another parameter "orig_nodemask" which gets passed into __alloc_pages_nodemask() in the first place. So inside zonelist iterator we can compare orig_nodemask with current ac.nodemask to figure out if cpuset swapping of nodemask happened and skip CDM node if necessary. Thats a viable solution IMHO. > > This might change if we decide to fix the cpuset vs mempolicy issues [1] > so your input on that topic with your recent experience with all the > alternative CDM isolation implementations would be useful. Thanks. > > [1] http://www.spinics.net/lists/linux-mm/msg121760.html Sure, will look into the details. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>