The patch titled Subject: memblock: introduce a for_each_reserved_mem_region iterator has been added to the -mm tree. Its filename is memblock-introduce-a-for_each_reserved_mem_region-iterator.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/memblock-introduce-a-for_each_reserved_mem_region-iterator.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Robin Holt <holt@xxxxxxx> Subject: memblock: introduce a for_each_reserved_mem_region iterator Struct page initialisation had been identified as one of the reasons why large machines take a long time to boot. Patches were posted a long time ago to defer initialisation until they were first used. This was rejected on the grounds it should not be necessary to hurt the fast paths. This series reuses much of the work from that time but defers the initialisation of memory to kswapd so that one thread per node initialises memory local to that node. After applying the series and setting the appropriate Kconfig variable I see this in the boot log on a 64G machine [ 7.383764] kswapd 0 initialised deferred memory in 188ms [ 7.404253] kswapd 1 initialised deferred memory in 208ms [ 7.411044] kswapd 3 initialised deferred memory in 216ms [ 7.411551] kswapd 2 initialised deferred memory in 216ms On a 1TB machine, I see [ 8.406511] kswapd 3 initialised deferred memory in 1116ms [ 8.428518] kswapd 1 initialised deferred memory in 1140ms [ 8.435977] kswapd 0 initialised deferred memory in 1148ms [ 8.437416] kswapd 2 initialised deferred memory in 1148ms Once booted the machine appears to work as normal. Boot times were measured from the time shutdown was called until ssh was available again. In the 64G case, the boot time savings are negligible. On the 1TB machine, the savings were 16 seconds. This patch (of 13): As part of initializing struct page's in 2MiB chunks, we noticed that at the end of free_all_bootmem(), there was nothing which had forced the reserved/allocated 4KiB pages to be initialized. This helper function will be used for that expansion. Signed-off-by: Robin Holt <holt@xxxxxxx> Signed-off-by: Nate Zimmer <nzimmer@xxxxxxx> Signed-off-by: Mel Gorman <mgorman@xxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> Cc: Waiman Long <waiman.long@xxxxxx> Cc: Scott Norton <scott.norton@xxxxxx> Cc: Daniel J Blueman <daniel@xxxxxxxxxxxxx> Cc: "Luck, Tony" <tony.luck@xxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- include/linux/memblock.h | 18 ++++++++++++++++++ mm/memblock.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff -puN include/linux/memblock.h~memblock-introduce-a-for_each_reserved_mem_region-iterator include/linux/memblock.h --- a/include/linux/memblock.h~memblock-introduce-a-for_each_reserved_mem_region-iterator +++ a/include/linux/memblock.h @@ -93,6 +93,9 @@ void __next_mem_range_rev(u64 *idx, int struct memblock_type *type_b, phys_addr_t *out_start, phys_addr_t *out_end, int *out_nid); +void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start, + phys_addr_t *out_end); + /** * for_each_mem_range - iterate through memblock areas from type_a and not * included in type_b. Or just type_a if type_b is NULL. @@ -132,6 +135,21 @@ void __next_mem_range_rev(u64 *idx, int __next_mem_range_rev(&i, nid, type_a, type_b, \ p_start, p_end, p_nid)) +/** + * for_each_reserved_mem_region - iterate over all reserved memblock areas + * @i: u64 used as loop variable + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL + * + * Walks over reserved areas of memblock. Available as soon as memblock + * is initialized. + */ +#define for_each_reserved_mem_region(i, p_start, p_end) \ + for (i = 0UL, \ + __next_reserved_mem_region(&i, p_start, p_end); \ + i != (u64)ULLONG_MAX; \ + __next_reserved_mem_region(&i, p_start, p_end)) + #ifdef CONFIG_MOVABLE_NODE static inline bool memblock_is_hotpluggable(struct memblock_region *m) { diff -puN mm/memblock.c~memblock-introduce-a-for_each_reserved_mem_region-iterator mm/memblock.c --- a/mm/memblock.c~memblock-introduce-a-for_each_reserved_mem_region-iterator +++ a/mm/memblock.c @@ -779,6 +779,38 @@ int __init_memblock memblock_clear_hotpl } /** + * __next_reserved_mem_region - next function for for_each_reserved_region() + * @idx: pointer to u64 loop variable + * @out_start: ptr to phys_addr_t for start address of the region, can be %NULL + * @out_end: ptr to phys_addr_t for end address of the region, can be %NULL + * + * Iterate over all reserved memory regions. + */ +void __init_memblock __next_reserved_mem_region(u64 *idx, + phys_addr_t *out_start, + phys_addr_t *out_end) +{ + struct memblock_type *rsv = &memblock.reserved; + + if (*idx >= 0 && *idx < rsv->cnt) { + struct memblock_region *r = &rsv->regions[*idx]; + phys_addr_t base = r->base; + phys_addr_t size = r->size; + + if (out_start) + *out_start = base; + if (out_end) + *out_end = base + size - 1; + + *idx += 1; + return; + } + + /* signal end of iteration */ + *idx = ULLONG_MAX; +} + +/** * __next__mem_range - next function for for_each_free_mem_range() etc. * @idx: pointer to u64 loop variable * @nid: node selector, %NUMA_NO_NODE for all nodes _ Patches currently in -mm which might be from holt@xxxxxxx are memblock-introduce-a-for_each_reserved_mem_region-iterator.patch mm-meminit-move-page-initialization-into-a-separate-function.patch mm-meminit-only-set-page-reserved-in-the-memblock-region.patch mm-page_alloc-pass-pfn-to-__free_pages_bootmem.patch mm-meminit-make-__early_pfn_to_nid-smp-safe-and-introduce-meminit_pfn_in_nid.patch mm-meminit-inline-some-helper-functions.patch mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set.patch mm-meminit-initialise-a-subset-of-struct-pages-if-config_deferred_struct_page_init-is-set-fix.patch mm-meminit-initialise-remaining-struct-pages-in-parallel-with-kswapd.patch mm-meminit-minimise-number-of-pfn-page-lookups-during-initialisation.patch x86-mm-enable-deferred-struct-page-initialisation-on-x86-64.patch mm-meminit-free-pages-in-large-chunks-where-possible.patch mm-meminit-reduce-number-of-times-pageblocks-are-set-during-struct-page-init.patch mm-meminit-remove-mminit_verify_page_links.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html