Re: [patch] mm: default to node zonelist ordering when nodes have only lowmem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 25, 2010 at 03:33:08PM -0700, David Rientjes wrote:
> There are two types of zonelist ordering methodologies:
> 
>  - node order, preferring allocations on a node to stay local to and
> 
>  - zone order, preferring allocations come from a higher zone to avoid
>    allocating in lowmem zones even though they may not be local.
> 
> The ordering technique used by the kernel is configurable on the command
> line, but also has some logic to determine what the default should be.
> 
> This logic currently lacks knowledge of systems where a node may only
> have lowmem.  For such systems, it is necessary to use node order so that
> GFP_KERNEL allocations may be satisfied by nodes consisting of only
> lowmem.
> 
> If zone order is used, GFP_KERNEL allocations to such nodes are actually
> allocated on a node with local affinity that includes ZONE_NORMAL.
> 
> This change defaults to node zonelist ordering if any node lacks
> ZONE_NORMAL.
> 
> To force zone order, append 'numa_zonelist_order=zone' to the kernel
> command line.
> 
> Cc: Mel Gorman <mel@xxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> ---
>  mm/page_alloc.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2582,7 +2582,7 @@ static int default_zonelist_order(void)
>           * ZONE_DMA and ZONE_DMA32 can be very small area in the sytem.
>  	 * If they are really small and used heavily, the system can fall
>  	 * into OOM very easily.
> -	 * This function detect ZONE_DMA/DMA32 size and confgigures zone order.
> +	 * This function detect ZONE_DMA/DMA32 size and configures zone order.
>  	 */

Spurious change here but it's not very important.

>  	/* Is there ZONE_NORMAL ? (ex. ppc has only DMA zone..) */
>  	low_kmem_size = 0;
> @@ -2594,6 +2594,15 @@ static int default_zonelist_order(void)
>  				if (zone_type < ZONE_NORMAL)
>  					low_kmem_size += z->present_pages;
>  				total_size += z->present_pages;
> +			} else if (zone_type == ZONE_NORMAL) {
> +				/*

What if it was ZONE_DMA32?

> +				 * If any node has only lowmem, then node order
> +				 * is preferred to allow kernel allocations
> +				 * locally; otherwise, they can easily infringe
> +				 * on other nodes when there is an abundance of
> +				 * lowmem available to allocate from.
> +				 */
> +				return ZONELIST_ORDER_NODE;

It might be clearer if it was done as a similar check later

		if (low_kmem_size &&
		    total_size > average_size && /* ignore small node */
		    low_kmem_size > total_size * 70/100)
			return ZONELIST_ORDER_NODE;

This is saying if low memory is > 70% of total, then use nodes. To take
yours into account, it'd look something like;

if (low_kmwm_size && total_size > average_size) {
	if (lowmem_size == total_size)
		return ZONELIST_ORDER_ZONE;

	if (lowmem_size > total_size * 70/100)
		return ZONELIST_ORDER_NODE;
}

>  			}
>  		}
>  	}
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]