Re: [PATCH 2/2] mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 01, 2017 at 10:37:46AM +0200, Michal Hocko wrote:
>From: Michal Hocko <mhocko@xxxxxxxx>
>
>Heiko Carstens has noticed that he can generate overlapping zones for
>ZONE_DMA and ZONE_NORMAL:
>DMA      [mem 0x0000000000000000-0x000000007fffffff]
>Normal   [mem 0x0000000080000000-0x000000017fffffff]
>
>$ cat /sys/devices/system/memory/block_size_bytes
>10000000
>$ cat /sys/devices/system/memory/memory5/valid_zones
>DMA
>$ echo 0 > /sys/devices/system/memory/memory5/online
>$ cat /sys/devices/system/memory/memory5/valid_zones
>Normal
>$ echo 1 > /sys/devices/system/memory/memory5/online
>Normal
>
>$ cat /proc/zoneinfo
>Node 0, zone      DMA
>spanned  524288        <-----
>present  458752
>managed  455078
>start_pfn:           0 <-----
>
>Node 0, zone   Normal
>spanned  720896
>present  589824
>managed  571648
>start_pfn:           327680 <-----
>
>The reason is that we assume that the default zone for kernel onlining
>is ZONE_NORMAL. This was a simplification introduced by the memory
>hotplug rework and it is easily fixable by checking the range overlap in
>the zone order and considering the first matching zone as the default
>one. If there is no such zone then assume ZONE_NORMAL as we have been
>doing so far.
>
>Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
>Reported-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
>Tested-by: Heiko Carstens <heiko.carstens@xxxxxxxxxx>
>Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
>---
> drivers/base/memory.c          |  2 +-
> include/linux/memory_hotplug.h |  2 ++
> mm/memory_hotplug.c            | 27 ++++++++++++++++++++++++---
> 3 files changed, 27 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index b86fda30ce62..c7c4e0325cdb 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -419,7 +419,7 @@ static ssize_t show_valid_zones(struct device *dev,
> 
> 	nid = pfn_to_nid(start_pfn);
> 	if (allow_online_pfn_range(nid, start_pfn, nr_pages, MMOP_ONLINE_KERNEL)) {
>-		strcat(buf, NODE_DATA(nid)->node_zones[ZONE_NORMAL].name);
>+		strcat(buf, default_zone_for_pfn(nid, start_pfn, nr_pages)->name);
> 		append = true;
> 	}
> 
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 9e0249d0f5e4..ed167541e4fc 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -309,4 +309,6 @@ extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> 					  unsigned long pnum);
> extern bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> 		int online_type);
>+extern struct zone *default_zone_for_pfn(int nid, unsigned long pfn,
>+		unsigned long nr_pages);
> #endif /* __LINUX_MEMORY_HOTPLUG_H */
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index b3895fd609f4..a0348de3e18c 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -858,7 +858,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> {
> 	struct pglist_data *pgdat = NODE_DATA(nid);
> 	struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>-	struct zone *normal_zone =  &pgdat->node_zones[ZONE_NORMAL];
>+	struct zone *default_zone = default_zone_for_pfn(nid, pfn, nr_pages);
> 
> 	/*
> 	 * TODO there shouldn't be any inherent reason to have ZONE_NORMAL
>@@ -872,7 +872,7 @@ bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages,
> 			return true;
> 		return movable_zone->zone_start_pfn >= pfn + nr_pages;
> 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
>-		return zone_end_pfn(normal_zone) <= pfn;
>+		return zone_end_pfn(default_zone) <= pfn;
> 	}
> 
> 	/* MMOP_ONLINE_KEEP will always succeed and inherits the current zone */
>@@ -938,6 +938,27 @@ void __ref move_pfn_range_to_zone(struct zone *zone,
> }
> 
> /*
>+ * Returns a default kernel memory zone for the given pfn range.
>+ * If no kernel zone covers this pfn range it will automatically go
>+ * to the ZONE_NORMAL.
>+ */
>+struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
>+		unsigned long nr_pages)
>+{
>+	struct pglist_data *pgdat = NODE_DATA(nid);
>+	int zid;
>+
>+	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
>+		struct zone *zone = &pgdat->node_zones[zid];
>+
>+		if (zone_intersects(zone, start_pfn, nr_pages))
>+			return zone;
>+	}
>+
>+	return &pgdat->node_zones[ZONE_NORMAL];
>+}

Hmm... a corner case jumped into my mind which may invalidate this
calculation.

The case is:


       Zone:         | DMA   | DMA32      | NORMAL       |
                     v       v            v              v
       
       Phy mem:      [           ]     [                  ]
       
                     ^           ^     ^                  ^
       Node:         |   Node0   |     |      Node1       |
                             A   B     C  D


The key point is
1. There is a hole between Node0 and Node1
2. The hole sits in a non-normal zone

Let's mark the boundary as A, B, C, D. Then we would have
node0->zone[dma21] = [A, B]
node1->zone[dma32] = [C, D]

If we want to hotplug a range in [B, C] on node0, it looks not that bad. While
if we want to hotplug a range in [B, C] on node1, it will introduce the
overlapped zone. Because the range [B, C] intersects none of the existing
zones on node1.

Do you think this is possible?

>+
>+/*
>  * Associates the given pfn range with the given node and the zone appropriate
>  * for the given online type.
>  */
>@@ -945,7 +966,7 @@ static struct zone * __meminit move_pfn_range(int online_type, int nid,
> 		unsigned long start_pfn, unsigned long nr_pages)
> {
> 	struct pglist_data *pgdat = NODE_DATA(nid);
>-	struct zone *zone = &pgdat->node_zones[ZONE_NORMAL];
>+	struct zone *zone = default_zone_for_pfn(nid, start_pfn, nr_pages);
> 
> 	if (online_type == MMOP_ONLINE_KEEP) {
> 		struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE];
>-- 
>2.11.0

-- 
Wei Yang
Help you, Help me

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux