On Tue, Feb 07, 2017 at 10:45:57AM +0100, Michal Hocko wrote: >On Mon 06-02-17 23:43:14, Wei Yang wrote: >> The whole memory space is divided into several zones and nodes may have no >> page in some zones. In this case, the __absent_pages_in_range() would >> return 0, since the range it is searching for is an empty range. >> >> Also this happens more often to those nodes with higher memory range when >> there are more nodes, which is a trend for future architectures. > >I do not understand this part. Why would we see more zones with zero pfn >range in higher memory ranges. > Based on my understanding, zone boundary is fixed address. For example, on x84_64, ZONE_DMA is < 16M, ZONE_DMA32 is < 4G. And similar rules apply to sparc, ia64, s390 as shown in the comment of ZONE definition. For example, currently we see a server with 8 NUMA nodes and with 4T memory. Those zone boundaries may all sits in the first node range, so that the nodes with higher memory range may all sits in the last zone, which is ZONE_NORMAL I think. During the memory initialization, for each node we still iterate on each zone and calculate the memory range in each zone. By doing so, those nodes with higher memory range will see several empty zones. >> This patch checks the zone range after clamp and adjustment, return 0 if >> the range is an empty range. > >I assume the whole point of this patch is to save >__absent_pages_in_range which iterates over all memblock regions, right? Yes, you are right. Since we know there is no overlap, it is not necessary to do the iteration on memblock. >Is there any reason why for_each_mem_pfn_range cannot be changed to >honor the given start/end pfns instead? I can imagine that a small zone >would see a similar pointless iterations... > Hmm... No special reason, just not thought about this implementation. And actually I just do the similar thing as in zone_spanned_pages_in_node(), in which also return 0 when there is no overlap. BTW, I don't get your point. You wish to put the check in for_each_mem_pfn_range() definition? >> Signed-off-by: Wei Yang <richard.weiyang@xxxxxxxxx> >> --- >> mm/page_alloc.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 6de9440e3ae2..51c60c0eadcb 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5521,6 +5521,11 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid, >> adjust_zone_range_for_zone_movable(nid, zone_type, >> node_start_pfn, node_end_pfn, >> &zone_start_pfn, &zone_end_pfn); >> + >> + /* If this node has no page within this zone, return 0. */ >> + if (zone_start_pfn == zone_end_pfn) >> + return 0; >> + >> nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); >> >> /* >> -- >> 2.11.0 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> > >-- >Michal Hocko >SUSE Labs -- Wei Yang Help you, Help me
Attachment:
signature.asc
Description: PGP signature