On Tue 26-02-13 16:46:08, David Rientjes wrote: > On large systems with a lot of memory, walking all RAM to determine page > types may take a half second or even more. > > In non-blockable contexts, the page allocator will emit a page allocation > failure warning unless __GFP_NOWARN is specified. In such contexts, irqs > are typically disabled and such a lengthy delay may result in soft > lockups. But we are trying to prevent from soft lockups by calling touch_nmi_watchdog every now when iterating over pages so the lock up detector shouldn't trigger. Anyway, I think that the additional information (which can be really costly as you are describing) is not that useful. Most of the useful information is already printed by show_free_areas. Or does it help when we know how much memory is shared/reserved/etc. when the allocation fails? So I do agree with the dropping the additional information for the allocation failure path (sysrq+m might still show it) but I fail to see how the lockup detector plays any role here. Can we just drop it because it is not that interesting and it is costly so it is not worth bothering? > To fix this, suppress the page walk in such contexts when printing the > page allocation failure warning. > > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > --- > arch/arm/mm/init.c | 3 +++ > arch/ia64/mm/contig.c | 2 ++ > arch/ia64/mm/discontig.c | 2 ++ > arch/parisc/mm/init.c | 2 ++ > arch/unicore32/mm/init.c | 3 +++ > include/linux/mm.h | 3 ++- > lib/show_mem.c | 3 +++ > mm/page_alloc.c | 7 +++++++ > 8 files changed, 24 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c > --- a/arch/arm/mm/init.c > +++ b/arch/arm/mm/init.c > @@ -99,6 +99,9 @@ void show_mem(unsigned int filter) > printk("Mem-info:\n"); > show_free_areas(filter); > > + if (filter & SHOW_MEM_FILTER_PAGE_COUNT) > + return; > + > for_each_bank (i, mi) { > struct membank *bank = &mi->bank[i]; > unsigned int pfn1, pfn2; > diff --git a/arch/ia64/mm/contig.c b/arch/ia64/mm/contig.c > --- a/arch/ia64/mm/contig.c > +++ b/arch/ia64/mm/contig.c > @@ -47,6 +47,8 @@ void show_mem(unsigned int filter) > printk(KERN_INFO "Mem-info:\n"); > show_free_areas(filter); > printk(KERN_INFO "Node memory in pages:\n"); > + if (filter & SHOW_MEM_FILTER_PAGE_COUNT) > + return; > for_each_online_pgdat(pgdat) { > unsigned long present; > unsigned long flags; > diff --git a/arch/ia64/mm/discontig.c b/arch/ia64/mm/discontig.c > --- a/arch/ia64/mm/discontig.c > +++ b/arch/ia64/mm/discontig.c > @@ -623,6 +623,8 @@ void show_mem(unsigned int filter) > > printk(KERN_INFO "Mem-info:\n"); > show_free_areas(filter); > + if (filter & SHOW_MEM_FILTER_PAGE_COUNT) > + return; > printk(KERN_INFO "Node memory in pages:\n"); > for_each_online_pgdat(pgdat) { > unsigned long present; > diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c > --- a/arch/parisc/mm/init.c > +++ b/arch/parisc/mm/init.c > @@ -697,6 +697,8 @@ void show_mem(unsigned int filter) > > printk(KERN_INFO "Mem-info:\n"); > show_free_areas(filter); > + if (filter & SHOW_MEM_FILTER_PAGE_COUNT) > + return; > #ifndef CONFIG_DISCONTIGMEM > i = max_mapnr; > while (i-- > 0) { > diff --git a/arch/unicore32/mm/init.c b/arch/unicore32/mm/init.c > --- a/arch/unicore32/mm/init.c > +++ b/arch/unicore32/mm/init.c > @@ -66,6 +66,9 @@ void show_mem(unsigned int filter) > printk(KERN_DEFAULT "Mem-info:\n"); > show_free_areas(filter); > > + if (filter & SHOW_MEM_FILTER_PAGE_COUNT) > + return; > + > for_each_bank(i, mi) { > struct membank *bank = &mi->bank[i]; > unsigned int pfn1, pfn2; > diff --git a/include/linux/mm.h b/include/linux/mm.h > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -898,7 +898,8 @@ extern void pagefault_out_of_memory(void); > * Flags passed to show_mem() and show_free_areas() to suppress output in > * various contexts. > */ > -#define SHOW_MEM_FILTER_NODES (0x0001u) /* filter disallowed nodes */ > +#define SHOW_MEM_FILTER_NODES (0x0001u) /* disallowed nodes */ > +#define SHOW_MEM_FILTER_PAGE_COUNT (0x0002u) /* page type count */ > > extern void show_free_areas(unsigned int flags); > extern bool skip_free_areas_node(unsigned int flags, int nid); > diff --git a/lib/show_mem.c b/lib/show_mem.c > --- a/lib/show_mem.c > +++ b/lib/show_mem.c > @@ -18,6 +18,9 @@ void show_mem(unsigned int filter) > printk("Mem-Info:\n"); > show_free_areas(filter); > > + if (filter & SHOW_MEM_FILTER_PAGE_COUNT) > + return; > + > for_each_online_pgdat(pgdat) { > unsigned long i, flags; > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2009,6 +2009,13 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...) > return; > > /* > + * Walking all memory to count page types is very expensive and should > + * be inhibited in non-blockable contexts. > + */ > + if (!(gfp_mask & __GFP_WAIT)) > + filter |= SHOW_MEM_FILTER_PAGE_COUNT; > + > + /* > * This documents exceptions given to allocations in certain > * contexts that are allowed to allocate outside current's set > * of allowed nodes. > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html