[patch 029/115] mm, memory_hotplug: support movable_node for hotpluggable nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Michal Hocko <mhocko@xxxxxxxxxx>
Subject: mm, memory_hotplug: support movable_node for hotpluggable nodes

movable_node kernel parameter allows making hotpluggable NUMA nodes to put
all the hotplugable memory into movable zone which allows more or less
reliable memory hotremove.  At least this is the case for the NUMA nodes
present during the boot (see find_zone_movable_pfns_for_nodes).

This is not the case for the memory hotplug, though.

	echo online > /sys/devices/system/memory/memoryXYZ/state

will default to a kernel zone (usually ZONE_NORMAL) unless the particular
memblock is already in the movable zone range which is not the case
normally when onlining the memory from the udev rule context for a freshly
hotadded NUMA node.  The only option currently is to have a special udev
rule to echo online_movable to all memblocks belonging to such a node
which is rather clumsy.  Not to mention this is inconsistent as well
because what ended up in the movable zone during the boot will end up in a
kernel zone after hotremove & hotadd without special care.

It would be nice to reuse memblock_is_hotpluggable but the runtime hotplug
doesn't have that information available because the boot and hotplug paths
are not shared and it would be really non trivial to make them use the
same code path because the runtime hotplug doesn't play with the memblock
allocator at all.

Teach move_pfn_range that MMOP_ONLINE_KEEP can use the movable zone if
movable_node is enabled and the range doesn't overlap with the existing
normal zone.  This should provide a reasonable default onlining strategy.

Strictly speaking the semantic is not identical with the boot time
initialization because find_zone_movable_pfns_for_nodes covers only the
hotplugable range as described by the BIOS/FW.  From my experience this is
usually a full node though (except for Node0 which is special and never
goes away completely).  If this turns out to be a problem in the real life
we can tweak the code to store hotplug flag into memblocks but let's keep
this simple now.

Link: http://lkml.kernel.org/r/20170612111227.GI7476@xxxxxxxxxxxxxx
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
Acked-by: Reza Arbab <arbab@xxxxxxxxxxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Yasuaki Ishimatsu <yasu.isimatu@xxxxxxxxx>
Cc: <qiuxishi@xxxxxxxxxx>
Cc: Kani Toshimitsu <toshi.kani@xxxxxxx>
Cc: <slaoub@xxxxxxxxx>
Cc: Joonsoo Kim <js1304@xxxxxxxxx>
Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Daniel Kiper <daniel.kiper@xxxxxxxxxx>
Cc: Igor Mammedov <imammedo@xxxxxxxxxx>
Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 Documentation/memory-hotplug.txt |   12 +++++++++---
 mm/memory_hotplug.c              |   19 ++++++++++++++++---
 2 files changed, 25 insertions(+), 6 deletions(-)

diff -puN Documentation/memory-hotplug.txt~mm-memory_hotplug-support-movable_node-for-hotplugable-nodes Documentation/memory-hotplug.txt
--- a/Documentation/memory-hotplug.txt~mm-memory_hotplug-support-movable_node-for-hotplugable-nodes
+++ a/Documentation/memory-hotplug.txt
@@ -282,20 +282,26 @@ offlined it is possible to change the in
 % echo online > /sys/devices/system/memory/memoryXXX/state
 
 This onlining will not change the ZONE type of the target memory block,
-If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE:
+If the memory block doesn't belong to any zone an appropriate kernel zone
+(usually ZONE_NORMAL) will be used unless movable_node kernel command line
+option is specified when ZONE_MOVABLE will be used.
+
+You can explicitly request to associate it with ZONE_MOVABLE by
 
 % echo online_movable > /sys/devices/system/memory/memoryXXX/state
 (NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE)
 
-And if the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL:
+Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:
 
 % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
 (NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL)
 
+An explicit zone onlining can fail (e.g. when the range is already within
+and existing and incompatible zone already).
+
 After this, memory block XXX's state will be 'online' and the amount of
 available memory will be increased.
 
-Currently, newly added memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA).
 This may be changed in future.
 
 
diff -puN mm/memory_hotplug.c~mm-memory_hotplug-support-movable_node-for-hotplugable-nodes mm/memory_hotplug.c
--- a/mm/memory_hotplug.c~mm-memory_hotplug-support-movable_node-for-hotplugable-nodes
+++ a/mm/memory_hotplug.c
@@ -934,6 +934,19 @@ struct zone *default_zone_for_pfn(int ni
 	return &pgdat->node_zones[ZONE_NORMAL];
 }
 
+static inline bool movable_pfn_range(int nid, struct zone *default_zone,
+		unsigned long start_pfn, unsigned long nr_pages)
+{
+	if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
+				MMOP_ONLINE_KERNEL))
+		return true;
+
+	if (!movable_node_is_enabled())
+		return false;
+
+	return !zone_intersects(default_zone, start_pfn, nr_pages);
+}
+
 /*
  * Associates the given pfn range with the given node and the zone appropriate
  * for the given online type.
@@ -949,10 +962,10 @@ static struct zone * __meminit move_pfn_
 		/*
 		 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use
 		 * movable zone if that is not possible (e.g. we are within
-		 * or past the existing movable zone)
+		 * or past the existing movable zone). movable_node overrides
+		 * this default and defaults to movable zone
 		 */
-		if (!allow_online_pfn_range(nid, start_pfn, nr_pages,
-					MMOP_ONLINE_KERNEL))
+		if (movable_pfn_range(nid, zone, start_pfn, nr_pages))
 			zone = movable_zone;
 	} else if (online_type == MMOP_ONLINE_MOVABLE) {
 		zone = &pgdat->node_zones[ZONE_MOVABLE];
_
--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux