Hi All, On 3/31/20 7:31 AM, Baoquan He wrote:
On 03/31/20 at 04:21pm, Michal Hocko wrote:On Tue 31-03-20 22:03:32, Baoquan He wrote:Hi Michal, On 03/31/20 at 10:55am, Michal Hocko wrote:On Tue 31-03-20 11:14:23, Mike Rapoport wrote:Maybe I mis-read the code, but I don't see how this could happen. In the HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls calculate_node_totalpages() that ensures that node->node_zones are entirely within the node because this is checked in zone_spanned_pages_in_node().zone_spanned_pages_in_node does chech the zone boundaries are within the node boundaries. But that doesn't really tell anything about other potential zones interleaving with the physical memory range. zone->spanned_pages simply gives the physical range for the zone including holes. Interleaving nodes are essentially a hole (__absent_pages_in_range is going to skip those). That means that when free_area_init_core simply goes over the whole physical zone range including holes and that is why we need to check both for physical and logical holes (aka other nodes). The life would be so much easier if the whole thing would simply iterate over memblocks...The memblock iterating sounds a great idea. I tried with putting the memblock iterating in the upper layer, memmap_init(), which is used for boot mem only anyway. Do you think it's doable and OK? It yes, I can work out a formal patch to make this simpler as you said. The draft code is as below. Like this it uses the existing code and involves little change.Doing this would be a step in the right direction! I haven't checked the code very closely though. The below sounds way too simple to be truth I am afraid. First for_each_mem_pfn_range is available only for CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep saying that I really hate that being conditional). Also I haven't really checked the deferred initialization path - I have a very vague recollection that it has been converted to the memblock api but I have happilly dropped all that memory.Thanks for your quick response and pointing out the rest suspect aspects, I will investigate what you mentioned, see if they impact.
I would like to check if we still move on with my patch to remove CONFIG_NODES_SPAN_OTHER_NODES and have another patch on top it?
Thanks Hoan
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 138a56c0f48f..558d421f294b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * function. They do not exist on hotplugged memory. */ if (context == MEMMAP_EARLY) { - if (!early_pfn_valid(pfn)) { - pfn = next_pfn(pfn); - continue; - } - if (!early_pfn_in_nid(pfn, nid)) { - pfn++; - continue; - } if (overlap_memmap_init(zone, &pfn)) continue; if (defer_init(nid, pfn, end_pfn)) @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct zone *zone) }void __meminit __weak memmap_init(unsigned long size, int nid,- unsigned long zone, unsigned long start_pfn) + unsigned long zone, unsigned long range_start_pfn) { - memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL); + unsigned long start_pfn, end_pfn; + unsigned long range_end_pfn = range_start_pfn + size; + int i; + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { + start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); + end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); + if (end_pfn > start_pfn) + memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL); + } }static int zone_batchsize(struct zone *zone)-- Michal Hocko SUSE Labs