Re: [3/7, v9] NUMA Hotplug Emulator: Add node hotplug emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 10 Dec 2010 15:31:22 +0800
shaohui.zheng@xxxxxxxxx wrote:

> From: David Rientjes <rientjes@xxxxxxxxxx>
> 
> Add an interface to allow new nodes to be added when performing memory
> hot-add.  This provides a convenient interface to test memory hotplug
> notifier callbacks and surrounding hotplug code when new nodes are
> onlined without actually having a machine with such hotpluggable SRAT
> entries.
> 
> This adds a new debugfs interface at /sys/kernel/debug/mem_hotplug/add_node
> that behaves in a similar way to the memory hot-add "probe" interface.
> Its format is size@start, where "size" is the size of the new node to be
> added and "start" is the physical address of the new memory.
> 
> The new node id is a currently offline, but possible, node.  The bit must
> be set in node_possible_map so that nr_node_ids is sized appropriately.
> 
> For emulation on x86, for example, it would be possible to set aside
> memory for hotplugged nodes (say, anything above 2G) and to add an
> additional four nodes as being possible on boot with
> 
> 	mem=2G numa=possible=4
> 
> and then creating a new 128M node at runtime:
> 
> 	# echo 128M@0x80000000 > /sys/kernel/debug/mem_hotplug/add_node
> 	On node 1 totalpages: 0
> 	init_memory_mapping: 0000000080000000-0000000088000000
> 	 0080000000 - 0088000000 page 2M
> Once the new node has been added, its memory can be onlined.  If this
> memory represents memory section 16, for example:
> 
> 	# echo online > /sys/devices/system/memory/memory16/state
> 	Built 2 zonelists in Node order, mobility grouping on.  Total pages: 514846
> 	Policy zone: Normal
>  [ The memory section(s) mapped to a particular node are visible via
>    /sys/kernel/debug/mem_hotplug/node1, in this example. ]
> 
> The new node is now hotplugged and ready for testing.
> 
> CC: Haicheng Li <haicheng.li@xxxxxxxxx>
> CC: Greg KH <gregkh@xxxxxxx>
> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>
> Signed-off-by: Shaohui Zheng <shaohui.zheng@xxxxxxxxx>
> ---
>  Documentation/memory-hotplug.txt |   24 +++++++++++++++
>  mm/memory_hotplug.c              |   59 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+), 0 deletions(-)
> Index: linux-hpe4/Documentation/memory-hotplug.txt
> ===================================================================
> --- linux-hpe4.orig/Documentation/memory-hotplug.txt	2010-11-30 12:40:43.527622001 +0800
> +++ linux-hpe4/Documentation/memory-hotplug.txt	2010-11-30 14:11:11.827622000 +0800
> @@ -18,6 +18,7 @@
>  4. Physical memory hot-add phase
>    4.1 Hardware(Firmware) Support
>    4.2 Notify memory hot-add event by hand
> +  4.3 Node hotplug emulation
>  5. Logical Memory hot-add phase
>    5.1. State of memory
>    5.2. How to online memory
> @@ -215,6 +216,29 @@
>  Please see "How to online memory" in this text.
>  
>  
> +4.3 Node hotplug emulation
> +------------
> +With debugfs, it is possible to test node hotplug by assigning the newly
> +added memory to a new node id when using a different interface with a similar
> +behavior to "probe" described in section 4.2.  If a node id is possible
> +(there are bits in /sys/devices/system/memory/possible that are not online),
> +then it may be used to emulate a newly added node as the result of memory
> +hotplug by using the debugfs "add_node" interface.
> +
> +The add_node interface is located at "mem_hotplug/add_node" at the debugfs
> +mount point.
> +
> +You can create a new node of a specified size starting at the physical
> +address of new memory by
> +
> +% echo size@start_address_of_new_memory > /sys/kernel/debug/mem_hotplug/add_node
> +
> +Where "size" can be represented in megabytes or gigabytes (for example,
> +"128M" or "1G").  The minumum size is that of a memory section.
> +
> +Once the new node has been added, it is possible to online the memory by
> +toggling the "state" of its memory section(s) as described in section 5.1.
> +
>  
>  ------------------------------
>  5. Logical Memory hot-add phase
> Index: linux-hpe4/mm/memory_hotplug.c
> ===================================================================
> --- linux-hpe4.orig/mm/memory_hotplug.c	2010-11-30 12:40:43.757622001 +0800
> +++ linux-hpe4/mm/memory_hotplug.c	2010-11-30 14:02:33.877622002 +0800
> @@ -924,3 +924,63 @@
>  }
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  EXPORT_SYMBOL_GPL(remove_memory);
> +
> +#ifdef CONFIG_DEBUG_FS
> +#include <linux/debugfs.h>
> +
> +static struct dentry *memhp_debug_root;
> +
> +static ssize_t add_node_store(struct file *file, const char __user *buf,
> +				size_t count, loff_t *ppos)
> +{
> +	nodemask_t mask;

NODEMASK_ALLOC()?

> +	u64 start, size;
> +	char buffer[64];
> +	char *p;
> +	int nid;
> +	int ret;
> +
> +	memset(buffer, 0, sizeof(buffer));
> +	if (count > sizeof(buffer) - 1)
> +		count = sizeof(buffer) - 1;

This will cause the write to return a smaller number than `count': a
short write.  Some userspace code may then decide to write the
remainder of the data (whcih is the correct way to use the write()
syscall).

Could be a bit dangerous, and perhaps simply declaring an error if too
much data was written would be a better approach.

> +	if (copy_from_user(buffer, buf, count))
> +		return -EFAULT;
> +
> +	size = memparse(buffer, &p);
> +	if (size < (PAGES_PER_SECTION << PAGE_SHIFT))

PAGES_PER_SECTION has type unsigned long, so the rhs of this comparison
might overflow on 32-bit, should anyone ever try to use this code on
32-bit.

otoh the compiler might do it as 64-bit because the lhs is 64-bit.  Not
sure.

> +		return -EINVAL;
> +	if (*p != '@')
> +		return -EINVAL;
> +
> +	start = simple_strtoull(p + 1, NULL, 0);

You disagreed with checkpatch?

> +	nodes_andnot(mask, node_possible_map, node_online_map);
> +	nid = first_node(mask);
> +	if (nid == MAX_NUMNODES)
> +		return -ENOMEM;
> +
> +	ret = add_memory(nid, start, size);
> +	return ret ? ret : count;
> +}
> +
> +static const struct file_operations add_node_file_ops = {
> +	.write		= add_node_store,
> +	.llseek		= generic_file_llseek,
> +};
> +
> +static int __init node_debug_init(void)
> +{
> +	if (!memhp_debug_root)
> +		memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
> +	if (!memhp_debug_root)
> +		return -ENOMEM;
> +
> +	if (!debugfs_create_file("add_node", S_IWUSR, memhp_debug_root,
> +			NULL, &add_node_file_ops))
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +module_init(node_debug_init);
> +#endif /* CONFIG_DEBUG_FS */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]