On Thu, Aug 04, 2016 at 07:12:45PM +0530, Srikar Dronamraju wrote: > Fadump kernel reserves large chunks of memory even before the pages are > initialized. This could mean memory that corresponds to several nodes might > fall in memblock reserved regions. > > Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize > only certain size memory per node. The certain size takes into account > the dentry and inode cache sizes. Currently the cache sizes are > calculated based on the total system memory including the reserved > memory. However such a kernel when booting the same kernel as fadump > kernel will not be able to allocate the required amount of memory to > suffice for the dentry and inode caches. This results in crashes like > the below on large systems such as 32 TB systems. > > Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes) > vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes > swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC) > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3 > Call Trace: > [c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable) > [c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160 > [c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340 > [c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90 > [c00000000108fd40] [c000000000aecfb0] > alloc_large_system_hash+0x1b8/0x2c0 > [c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4 > [c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c > [c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578 > [c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8 > > Register the memory reserved by fadump, so that the cache sizes are > calculated based on the free memory (i.e Total memory - reserved > memory). > > Suggested-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> I didn't suggest this specifically. While it happens to be safe on ppc64, it potentially overwrites any future caller of set_dma_reserve. While the only other one is for the e820 map, it may be better to change the API to inc_dma_reserve? It's also unfortunate that it's called dma_reserve because it has nothing to do with DMA or ZONE_DMA. inc_kernel_reserve may be more appropriate? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>