The patch titled GFP_THISNODE must not trigger global reclaim has been added to the -mm tree. Its filename is gfp_thisnode-must-not-trigger-global-reclaim.patch See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: GFP_THISNODE must not trigger global reclaim From: Christoph Lameter <clameter@xxxxxxx> The intent of GFP_THISNODE is to make sure that an allocation occurs on a particular node. If this is not possible then NULL needs to be returned so that the caller can choose what to do next on its own (the slab allocator depends on that). However, GFP_THISNODE currently triggers reclaim before returning a failure (GFP_THISNODE means GFP_NORETRY is set). If we have over allocated a node then we will currently do some reclaim before returning NULL. The caller may want memory from other nodes before reclaim should be triggered. (If the caller wants reclaim then he can directly use __GFP_THISNODE instead). There is no flag to avoid reclaim in the page allocator and adding yet another GFP_xx flag would be difficult given that we are out of available flags. So just compare and see if all bits for GFP_THISNODE (__GFP_THISNODE, __GFP_NORETRY and __GFP_NOWARN) are set. If so then we return NULL before waking up kswapd. Signed-off-by: Christoph Lameter <clameter@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxx> --- mm/page_alloc.c | 11 +++++++++++ 1 files changed, 11 insertions(+) diff -puN mm/page_alloc.c~gfp_thisnode-must-not-trigger-global-reclaim mm/page_alloc.c --- a/mm/page_alloc.c~gfp_thisnode-must-not-trigger-global-reclaim +++ a/mm/page_alloc.c @@ -1155,6 +1155,17 @@ restart: if (page) goto got_pg; + /* + * GFP_THISNODE (meaning __GFP_THISNODE, __GFP_NORETRY and + * __GFP_NOWARN set) should not cause reclaim since the subsystem + * (f.e. slab) using GFP_THISNODE may choose to trigger reclaim + * using a larger set of nodes after it has established that the + * allowed per node queues are empty and that nodes are + * over allocated. + */ + if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) + goto nopage; + for (z = zonelist->zones; *z; z++) wakeup_kswapd(*z, order); _ Patches currently in -mm which might be from clameter@xxxxxxx are memory-page-alloc-minor-cleanups.patch memory-page-alloc-minor-cleanups-fix.patch get-rid-of-zone_table.patch deal-with-cases-of-zone_dma-meaning-the-first-zone.patch get-rid-of-zone_table-fix-3.patch introduce-config_zone_dma.patch optional-zone_dma-in-the-vm.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set.patch optional-zone_dma-in-the-vm-no-gfp_dma-check-in-the-slab-if-no-config_zone_dma-is-set-reduce-config_zone_dma-ifdefs.patch optional-zone_dma-for-ia64.patch remove-zone_dma-remains-from-parisc.patch remove-zone_dma-remains-from-sh-sh64.patch set-config_zone_dma-for-arches-with-generic_isa_dma.patch zoneid-fix-up-calculations-for-zoneid_pgshift.patch remove-bio_cachep-from-slabh.patch move-sighand_cachep-to-include-signalh.patch move-vm_area_cachep-to-include-mmh.patch move-files_cachep-to-include-fileh.patch move-filep_cachep-to-include-fileh.patch move-fs_cachep-to-linux-fs_structh.patch move-names_cachep-to-linux-fsh.patch remove-uses-of-kmem_cache_t-from-mm-and-include-linux-slabh.patch drain_node_page-drain-pages-in-batch-units.patch slab-remove-slab_no_grow.patch slab-remove-slab_level_mask.patch slab-remove-slab_noio.patch slab-remove-slab_nofs.patch slab-remove-slab_user.patch slab-remove-slab_atomic.patch slab-remove-slab_kernel.patch slab-remove-slab_dma.patch slab-remove-kmem_cache_t.patch slab-deprecate-kmem_cache-t.patch slab-fixup-two-issues-in-kmalloc_node--__cache_alloc_node.patch gfp_thisnode-must-not-trigger-global-reclaim.patch radix-tree-rcu-lockless-readside.patch sched-domain-move-sched-group-allocations-to-percpu-area.patch sched-avoid-taking-rq-lock-in-wake_priority_sleeper.patch sched-remove-staggering-of-load-balancing.patch sched-disable-interrupts-for-locking-in-load_balance.patch sched-extract-load-calculation-from-rebalance_tick.patch sched-move-idle-status-calculation-into-rebalance_tick.patch sched-use-softirq-for-load-balancing.patch sched-call-tasklet-less-frequently.patch sched-add-option-to-serialize-load-balancing.patch sched-add-option-to-serialize-load-balancing-fix.patch mm-only-sched-add-a-few-scheduler-event-counters.patch zvc-support-nr_slab_reclaimable--nr_slab_unreclaimable-swap_prefetch.patch reduce-max_nr_zones-swap_prefetch-remove-incorrect-use-of-zone_highmem.patch numa-add-zone_to_nid-function-swap_prefetch.patch remove-uses-of-kmem_cache_t-from-mm-and-include-linux-slabh-prefetch.patch readahead-state-based-method-aging-accounting.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html