The patch titled mm: make mem_map allocation continuous has been added to the -mm tree. Its filename is mm-make-mem_map-allocation-continuous-v2.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/ ------------------------------------------------------ Subject: mm: make mem_map allocation continuous From: Yinghai Lu <yhlu.kernel.send@xxxxxxxxx> vmemmap allocation current got [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001800000 on node 0 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001c00000 on node 0 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810002000000 on node 0 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810002400000 on node 0 ... there is 2M hole between them. the rootcause is that usemap (24 bytes) will be allocated after every 2M mem_map. and it will push next vmemmap (2M) to next align (2M). solution: try to allocate mem_map continously. after patch, will get [ffffe20000000000-ffffe200001fffff] PMD ->ffff810001400000 on node 0 [ffffe20000200000-ffffe200003fffff] PMD ->ffff810001600000 on node 0 [ffffe20000400000-ffffe200005fffff] PMD ->ffff810001800000 on node 0 [ffffe20000600000-ffffe200007fffff] PMD ->ffff810001a00000 on node 0 [ffffe20000800000-ffffe200009fffff] PMD ->ffff810001c00000 on node 0 ... and usemap will share in page because of they are allocated continuously too. sparse_early_usemap_alloc: usemap = ffff810024e00000 size = 24 sparse_early_usemap_alloc: usemap = ffff810024e00080 size = 24 sparse_early_usemap_alloc: usemap = ffff810024e00100 size = 24 sparse_early_usemap_alloc: usemap = ffff810024e00180 size = 24 ... so we make the bootmem allocation more compact and use less memory for usemap. for power pc Badari Pulavarty <pbadari@xxxxxxxxxx> wrote: > You have to call sparse_init_one_section() on each pmap and usemap > as we allocate - since valid_section() depends on it (which is needed > by vmemmap_populate() to check if the section is populated or not). > On ppc, we need to call htab_bolted_mapping() on each section and > we need to skip existing sections. so try to allocate usemap at first altogether. Signed-off-by: Yinghai Lu <yhlu.kernel@xxxxxxxxx> Cc: Andy Whitcroft <apw@xxxxxxxxxxxx> Cc: Yasunori Goto <y-goto@xxxxxxxxxxxxxx> Cc: Dave Hansen <haveblue@xxxxxxxxxx> Cc: Bob Picco <bob.picco@xxxxxx> Cc: Christoph Lameter <clameter@xxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxx> Cc: Badari Pulavarty <pbadari@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/sparse.c | 32 +++++++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff -puN mm/sparse.c~mm-make-mem_map-allocation-continuous-v2 mm/sparse.c --- a/mm/sparse.c~mm-make-mem_map-allocation-continuous-v2 +++ a/mm/sparse.c @@ -295,22 +295,48 @@ void __init sparse_init(void) unsigned long pnum; struct page *map; unsigned long *usemap; + unsigned long **usemap_map; + int size; + + /* + * map is using big page (aka 2M in x86 64 bit) + * usemap is less one page (aka 24 bytes) + * so alloc 2M (with 2M align) and 24 bytes in turn will + * make next 2M slip to one more 2M later. + * then in big system, the memory will have a lot of holes... + * here try to allocate 2M pages continously. + * + * powerpc need to call sparse_init_one_section right after each + * sparse_early_mem_map_alloc, so allocate usemap_map at first. + */ + size = sizeof(unsigned long *) * NR_MEM_SECTIONS; + usemap_map = alloc_bootmem(size); + if (!usemap_map) + panic("can not allocate usemap_map\n"); for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) { if (!present_section_nr(pnum)) continue; + usemap_map[pnum] = sparse_early_usemap_alloc(pnum); + } - map = sparse_early_mem_map_alloc(pnum); - if (!map) + for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) { + if (!present_section_nr(pnum)) continue; - usemap = sparse_early_usemap_alloc(pnum); + usemap = usemap_map[pnum]; if (!usemap) continue; + map = sparse_early_mem_map_alloc(pnum); + if (!map) + continue; + sparse_init_one_section(__nr_to_section(pnum), pnum, map, usemap); } + + free_bootmem(__pa(usemap_map), size); } #ifdef CONFIG_MEMORY_HOTPLUG _ Patches currently in -mm which might be from yhlu.kernel.send@xxxxxxxxx are mm-make-mem_map-allocation-continuous-v2.patch mm-offset-align-in-alloc_bootmem.patch mm-make-reserve_bootmem-can-crossed-the-nodes.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html