The patch titled slab: fix two issues in kmalloc_node / __cache_alloc_node has been added to the -mm tree. Its filename is slab-fixup-two-issues-in-kmalloc_node--__cache_alloc_node.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: slab: fix two issues in kmalloc_node / __cache_alloc_node From: Christoph Lameter <clameter@xxxxxxx> This addresses two issues: 1. Kmalloc_node() may intermittently return NULL if we are allocating from the current node and are unable to obtain memory for the current node from the page allocator. This is because we call ___cache_alloc() if nodeid == numa_node_id() and ____cache_alloc is not able to fallback to other nodes. This was introduced in the 2.6.19 development cycle. <= 2.6.18 in that case does not do a restricted allocation and blindly trusts the page allocator to have given us memory from the indicated node. It inserts the page regardless of the node it came from into the queues for the current node. 2. If kmalloc_node() is used on a node that has not been bootstrapped yet then we may try to pass an invalid node number to ____cache_alloc_node() triggering a BUG(). Change the function to call fallback_alloc() instead. Only call fallback_alloc() if we are allowed to fallback at all. The need to handle a node not bootstrapped yet also first surfaced in the 2.6.19 cycle. Update the comments since they were still describing the old kmalloc_node from 2.6.12. Signed-off-by: Christoph Lameter <clameter@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- mm/slab.c | 40 ++++++++++++++++++++++++++++------------ 1 files changed, 28 insertions(+), 12 deletions(-) diff -puN mm/slab.c~slab-fixup-two-issues-in-kmalloc_node--__cache_alloc_node mm/slab.c --- a/mm/slab.c~slab-fixup-two-issues-in-kmalloc_node--__cache_alloc_node +++ a/mm/slab.c @@ -3536,29 +3536,45 @@ out: * @flags: See kmalloc(). * @nodeid: node number of the target node. * - * Identical to kmem_cache_alloc, except that this function is slow - * and can sleep. And it will allocate memory on the given node, which - * can improve the performance for cpu bound structures. - * New and improved: it will now make sure that the object gets - * put on the correct node list so that there is no false sharing. + * Identical to kmem_cache_alloc but it will allocate memory on the given + * node, which can improve the performance for cpu bound structures. + * + * Fallback to other node is possible if __GFP_THISNODE is not set. */ static __always_inline void * __cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, void *caller) { unsigned long save_flags; - void *ptr; + void *ptr = NULL; cache_alloc_debugcheck_before(cachep, flags); local_irq_save(save_flags); - if (nodeid == -1 || nodeid == numa_node_id() || - !cachep->nodelists[nodeid]) - ptr = ____cache_alloc(cachep, flags); - else - ptr = ____cache_alloc_node(cachep, flags, nodeid); - local_irq_restore(save_flags); + if (unlikely(nodeid == -1)) + nodeid = numa_node_id(); + + if (likely(cachep->nodelists[nodeid])) { + if (nodeid == numa_node_id()) { + /* + * Use the locally cached objects if possible. + * However ____cache_alloc does not allow fallback + * to other nodes. It may fail while we still have + * objects on other nodes available. + */ + ptr = ____cache_alloc(cachep, flags); + } + if (!ptr) { + /* ___cache_alloc_node can fall back to other nodes */ + ptr = ____cache_alloc_node(cachep, flags, nodeid); + } + } else { + /* Node not bootstrapped yet */ + if (!(flags & __GFP_THISNODE)) + ptr = fallback_alloc(cachep, flags); + } + local_irq_restore(save_flags); ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller); return ptr; _ Patches currently in -mm which might be from clameter@xxxxxxx are slab-fixup-two-issues-in-kmalloc_node--__cache_alloc_node.patch memory-page-alloc-minor-cleanups.patch memory-page-alloc-minor-cleanups-fix.patch get-rid-of-zone_table.patch deal-with-cases-of-zone_dma-meaning-the-first-zone.patch get-rid-of-zone_table-fix-3.patch introduce-config_zone_dma.patch optional-zone_dma-in-the-vm.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set-reduce-config_zone_dma-ifdefs.patch optional-zone_dma-for-ia64.patch remove-zone_dma-remains-from-parisc.patch remove-zone_dma-remains-from-sh-sh64.patch set-config_zone_dma-for-arches-with-generic_isa_dma.patch zoneid-fix-up-calculations-for-zoneid_pgshift.patch remove-bio_cachep-from-slabh.patch move-sighand_cachep-to-include-signalh.patch move-vm_area_cachep-to-include-mmh.patch move-files_cachep-to-include-fileh.patch move-filep_cachep-to-include-fileh.patch move-fs_cachep-to-linux-fs_structh.patch move-names_cachep-to-linux-fsh.patch remove-uses-of-kmem_cache_t-from-mm-and-include-linux-slabh.patch drain_node_page-drain-pages-in-batch-units.patch slab-remove-slab_no_grow.patch slab-remove-slab_level_mask.patch slab-remove-slab_noio.patch slab-remove-slab_nofs.patch slab-remove-slab_user.patch slab-remove-slab_atomic.patch slab-remove-slab_kernel.patch slab-remove-slab_dma.patch slab-remove-kmem_cache_t.patch slab-deprecate-kmem_cache-t.patch radix-tree-rcu-lockless-readside.patch sched-domain-move-sched-group-allocations-to-percpu-area.patch sched-avoid-taking-rq-lock-in-wake_priority_sleeper.patch sched-remove-staggering-of-load-balancing.patch sched-disable-interrupts-for-locking-in-load_balance.patch sched-extract-load-calculation-from-rebalance_tick.patch sched-move-idle-status-calculation-into-rebalance_tick.patch sched-use-softirq-for-load-balancing.patch sched-call-tasklet-less-frequently.patch sched-add-option-to-serialize-load-balancing.patch sched-add-option-to-serialize-load-balancing-fix.patch mm-only-sched-add-a-few-scheduler-event-counters.patch zvc-support-nr_slab_reclaimable--nr_slab_unreclaimable-swap_prefetch.patch reduce-max_nr_zones-swap_prefetch-remove-incorrect-use-of-zone_highmem.patch numa-add-zone_to_nid-function-swap_prefetch.patch remove-uses-of-kmem_cache_t-from-mm-and-include-linux-slabh-prefetch.patch readahead-state-based-method-aging-accounting.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html