On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote: >From: Michal Hocko <mhocko@xxxxxxxx> > >Heiko Carstens has noticed that he can generate overlapping zones for >ZONE_DMA and ZONE_NORMAL: >DMA [mem 0x0000000000000000-0x000000007fffffff] >Normal [mem 0x0000000080000000-0x000000017fffffff] > >$ cat /sys/devices/system/memory/block_size_bytes >10000000 >$ cat /sys/devices/system/memory/memory5/valid_zones >DMA >$ echo 0 > /sys/devices/system/memory/memory5/online >$ cat /sys/devices/system/memory/memory5/valid_zones >Normal >$ echo 1 > /sys/devices/system/memory/memory5/online >Normal > >$ cat /proc/zoneinfo >Node 0, zone DMA >spanned 524288 <----- >present 458752 >managed 455078 >start_pfn: 0 <----- > >Node 0, zone Normal >spanned 720896 >present 589824 >managed 571648 >start_pfn: 327680 <----- > >The reason is that we assume that the default zone for kernel onlining >is ZONE_NORMAL. This was a simplification introduced by the memory >hotplug rework and it is easily fixable by checking the range overlap in >the zone order and considering the first matching zone as the default >one. If there is no such zone then assume ZONE_NORMAL as we have been >doing so far. > >Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online" >Reported-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx> >Tested-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx> >Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> >--- > drivers/base/memory.c | 2 +- > include/linux/memory_hotplug.h | 2 ++ > mm/memory_hotplug.c | 27 ++++++++++++++++++++++++--- > 3 files changed, 27 insertions(+), 4 deletions(-) > >diff --git a/drivers/base/memory.c b/drivers/base/memory.c >index b86fda30ce62..c7c4e0325cdb 100644 >--- a/drivers/base/memory.c >+++ b/drivers/base/memory.c >@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev, > > nid = pfn_to_nid(start_pfn); > if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) { >- strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name); >+ strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name); > append = true; > } > >diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h >index 9e0249d0f5e4..ed167541e4fc 100644 >--- a/include/linux/memory_hotplug.h >+++ b/include/linux/memory_hotplug.h >@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, > unsigned long pnum); > extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, > int online_type); >+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn, >+ unsigned long nr_pages); > #endif /* __LINUX_MEMORY_HOTPLUG_H */ >diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >index b3895fd609f4..a0348de3e18c 100644 >--- a/mm/memory_hotplug.c >+++ b/mm/memory_hotplug.c >@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, > { > struct pglist_data *pgdat = NODE_DATA(nid); > struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; >- struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL]; >+ struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages); > > /* > * TODO there shouldn't be any inherent reason to have ZONE_NORMAL >@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, > return true; > return movable_zone->zone_start_pfn >= pfn + nr_pages; > } else if (online_type == MMOP_ONLINE_MOVABLE) { >- return zone_end_pfn(normal_zone) <= pfn; >+ return zone_end_pfn(default_zone) <= pfn; > } > > /* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */ >@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone, > } > > /* >+ * Returns a default kernel memory zone for the given pfn range. >+ * If no kernel zone covers this pfn range it will automatically go >+ * to the ZONE_NORMAL. >+ */ >+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn, >+ unsigned long nr_pages) >+{ >+ struct pglist_data *pgdat = NODE_DATA(nid); >+ int zid; >+ >+ for (zid = 0; zid <= ZONE_NORMAL; zid++) { >+ struct zone *zone = &pgdat->node_zones[zid]; >+ >+ if (zone_intersects(zone, start_pfn, nr_pages)) >+ return zone; >+ } >+ >+ return &pgdat->node_zones[ZONE_NORMAL]; >+} Hmm... a corner case jumped into my mind which may invalidate this calculation. The case is: Zone: | DMA | DMA32 | NORMAL | v v v v Phy mem: [ ] [ ] ^ ^ ^ ^ Node: | Node0 | | Node1 | A B C D The key point is 1. There is a hole between Node0 and Node1 2. The hole sits in a non-normal zone Let's mark the boundary as A, B, C, D. Then we would have node0->zone[dma21] = [A, B] node1->zone[dma32] = [C, D] If we want to hotplug a range in [B, C] on node0, it looks not that bad. While if we want to hotplug a range in [B, C] on node1, it will introduce the overlapped zone. Because the range [B, C] intersects none of the existing zones on node1. Do you think this is possible? >+ >+/* > * Associates the given pfn range with the given node and the zone appropriate > * for the given online type. > */ >@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid, > unsigned long start_pfn, unsigned long nr_pages) > { > struct pglist_data *pgdat = NODE_DATA(nid); >- struct zone *zone = &pgdat->node_zones[ZONE_NORMAL]; >+ struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages); > > if (online_type == MMOP_ONLINE_KEEP) { > struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; >-- >2.11.0 -- Wei Yang Help you, Help me
Attachment:
signature.asc
Description: PGP signature