Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone allocations (v4)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 29 May 2014, Marcelo Tosatti wrote:

> Zone specific allocations, such as GFP_DMA32, should not be restricted
> to cpusets allowed node list: the zones which such allocations demand
> might be contained in particular nodes outside the cpuset node list.
> 
> Necessary for the following usecase:
> - driver which requires zone specific memory (such as KVM, which
> requires root pagetable at paddr < 4GB).
> - user wants to limit allocations of application to nodeX, and nodeX has
> no memory < 4GB.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 3d54c41..3bbc23f 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -2374,6 +2374,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs)
>   * variable 'wait' is not set, and the bit ALLOC_CPUSET is not set
>   * in alloc_flags.  That logic and the checks below have the combined
>   * affect that:
> + *	gfp_zone(mask) < policy_zone - any node ok
>   *	in_interrupt - any node ok (current task context irrelevant)
>   *	GFP_ATOMIC   - any node ok
>   *	TIF_MEMDIE   - any node ok
> @@ -2392,6 +2393,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t gfp_mask)
>  
>  	if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
>  		return 1;
> +#ifdef CONFIG_NUMA
> +	if (gfp_zone(gfp_mask) < policy_zone)
> +		return 1;
> +#endif
>  	might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
>  	if (node_isset(node, current->mems_allowed))
>  		return 1;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..a0ce1ba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2726,6 +2726,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>  retry_cpuset:
>  	cpuset_mems_cookie = read_mems_allowed_begin();
>  
> +#ifdef CONFIG_NUMA
> +	if (gfp_zone(gfp_mask) < policy_zone)
> +		nodemask = &node_states[N_ONLINE];
> +#endif
> +
>  	/* The preferred zone is used for statistics later */
>  	first_zones_zonelist(zonelist, high_zoneidx,
>  				nodemask ? : &cpuset_current_mems_allowed,

There are still three issues with this, two of which are only minor and 
one that needs more thought:

 (1) this doesn't affect only cpusets which the changelog indicates, it 
     also bypasses mempolicies for GFP_DMA and GFP_DMA32 allocations since
     the nodemask != NULL in the page allocator when there is an effective
     mempolicy.  That may be precisely what you're trying to do (do the
     same for mempolicies as you're doing for cpusets), but the comment 
     now in the code specifically refers to cpusets.  Can you make a case
     for the mempolicies exception as well?  Otherwise, we'll need to do

	if (!nodemask && gfp_zone(gfp_mask) < policy_zone)
		nodemask = &node_states[N_ONLINE];

And the two minors:

 (2) this should be &node_states[N_MEMORY], not &node_states[N_ONLINE] 
     since memoryless nodes should not be included.  Note that
     guarantee_online_mems() looks at N_MEMORY and
     cpuset_current_mems_allowed is defined for N_MEMORY without
     cpusets.

 (3) it's unnecessary for this to be after the "retry_cpuset" label and
     check the gfp mask again if we need to relook at the allowed cpuset
     mask.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]