On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote: >On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote: >> >> We noticed a huge amount of slab memory consumed on a large ppc64 box: >> >> Slab: 2094336 kB >> >> Almost 2GB. This box is not balanced and some nodes do not have local >> memory, causing slub to be very inefficient in its slab usage. >> >> Each time we call kmem_cache_alloc_node slub checks the per cpu slab, >> sees it isn't node local, deactivates it and tries to allocate a new >> slab. On empty nodes we will allocate a new remote slab and use the >> first slot, but as explained above when we get called a second time >> we will just deactivate that slab and retry. >> >> As such we end up only using 1 entry in each slab: >> >> slab mem objects >> used active >> ------------------------------------ >> kmalloc-16384 1404 MB 4.90% >> task_struct 668 MB 2.90% >> kmalloc-128 193 MB 3.61% >> kmalloc-192 152 MB 5.23% >> kmalloc-8192 72 MB 23.40% >> kmalloc-16 64 MB 7.43% >> kmalloc-512 33 MB 22.41% >> >> The patch below checks that a node is not empty before deactivating a >> slab and trying to allocate it again. With this patch applied we now >> use about 352MB: >> >> Slab: 360192 kB >> >> And our efficiency is much better: >> >> slab mem objects >> used active >> ------------------------------------ >> kmalloc-16384 92 MB 74.27% >> task_struct 23 MB 83.46% >> idr_layer_cache 18 MB 100.00% >> pgtable-2^12 17 MB 100.00% >> kmalloc-65536 15 MB 100.00% >> inode_cache 14 MB 100.00% >> kmalloc-256 14 MB 97.81% >> kmalloc-8192 14 MB 85.71% >> >> Signed-off-by: Anton Blanchard <anton@xxxxxxxxx> >> --- >> >> Thoughts? It seems like we could hit a similar situation if a machine >> is balanced but we run out of memory on a single node. >> >> Index: b/mm/slub.c >> =================================================================== >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -2278,10 +2278,17 @@ redo: >> >> if (unlikely(!node_match(page, node))) { >> stat(s, ALLOC_NODE_MISMATCH); >> - deactivate_slab(s, page, c->freelist); >> - c->page = NULL; >> - c->freelist = NULL; >> - goto new_slab; >> + >> + /* >> + * If the node contains no memory there is no point in trying >> + * to allocate a new node local slab >> + */ >> + if (node_spanned_pages(node)) { >> + deactivate_slab(s, page, c->freelist); >> + c->page = NULL; >> + c->freelist = NULL; >> + goto new_slab; >> + } >> } >> >> /* > >Hello, > >I think that we need more efforts to solve unbalanced node problem. > >With this patch, even if node of current cpu slab is not favorable to >unbalanced node, allocation would proceed and we would get the unintended memory. > >And there is one more problem. Even if we have some partial slabs on >compatible node, we would allocate new slab, because get_partial() cannot handle >this unbalance node case. > >To fix this correctly, how about following patch? > >Thanks. > >------------->8-------------------- >diff --git a/mm/slub.c b/mm/slub.c >index c3eb3d3..a1f6dfa 100644 >--- a/mm/slub.c >+++ b/mm/slub.c >@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, > { > void *object; > int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node; >+ struct zonelist *zonelist; >+ struct zoneref *z; >+ struct zone *zone; >+ enum zone_type high_zoneidx = gfp_zone(flags); > >+ if (!node_present_pages(searchnode)) { >+ zonelist = node_zonelist(searchnode, flags); >+ for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { >+ searchnode = zone_to_nid(zone); >+ if (node_present_pages(searchnode)) >+ break; >+ } >+ } Why change searchnode instead of depending on fallback zones/nodes in get_any_partial() to allocate partial slabs? Regards, Wanpeng Li > object = get_partial_node(s, get_node(s, searchnode), c, flags); > if (object || node != NUMA_NO_NODE) > return object; > >-- >To unsubscribe, send a message with 'unsubscribe linux-mm' in >the body to majordomo@xxxxxxxxx. For more info on Linux MM, >see: http://www.linux-mm.org/ . >Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>