Currently, all newly added memory blocks remain in 'offline' state unless someone onlines them, some linux distributions carry special udev rules like: SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online" to make this happen automatically. This is not a great solution for virtual machines where memory hotplug is being used to address high memory pressure situations as such onlining is slow and a userspace process doing this (udev) has a chance of being killed by the OOM killer as it will probably require to allocate some memory. Introduce default policy for the newly added memory blocks in /sys/devices/system/memory/hotplug_autoonline file with two possible values: "offline" which preserves the current behavior and "online" which causes all newly added memory blocks to go online as soon as they're added. The default is "online" when MEMORY_HOTPLUG_AUTOONLINE kernel config option is selected. Cc: Jonathan Corbet <corbet@xxxxxxx> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> Cc: Daniel Kiper <daniel.kiper@xxxxxxxxxx> Cc: Dan Williams <dan.j.williams@xxxxxxxxx> Cc: Tang Chen <tangchen@xxxxxxxxxxxxxx> Cc: David Vrabel <david.vrabel@xxxxxxxxxx> Cc: David Rientjes <rientjes@xxxxxxxxxx> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> Cc: Xishi Qiu <qiuxishi@xxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx> Cc: Igor Mammedov <imammedo@xxxxxxxxxx> Cc: Kay Sievers <kay@xxxxxxxx> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> --- - Changes since 'v1': Add 'online' parameter to add_memory_resource() as it is being used by xen ballon driver and it adds "empty" memory pages [David Vrabel]. (I don't completely understand what prevents manual onlining in this case as we still have all newly added blocks in sysfs ... this is the discussion point.) - Changes since 'RFC': It seems nobody is strongly opposed to the idea, thus non-RFC. Change memhp_autoonline to bool, we support only MMOP_ONLINE_KEEP and MMOP_OFFLINE for the auto-onlining policy, eliminate 'unknown' from show_memhp_autoonline(). [Daniel Kiper] Put everything under CONFIG_MEMORY_HOTPLUG_AUTOONLINE, enable the feature by default (when the config option is selected) and add kernel parameter (nomemhp_autoonline) to disable the functionality upon boot when needed. - RFC: I was able to find previous attempts to fix the issue, e.g.: http://marc.info/?l=linux-kernel&m=137425951924598&w=2 http://marc.info/?l=linux-acpi&m=127186488905382 but I'm not completely sure why it didn't work out and the solution I suggest is not 'smart enough', thus 'RFC'. --- Documentation/kernel-parameters.txt | 2 ++ Documentation/memory-hotplug.txt | 26 ++++++++++++++++++++------ drivers/base/memory.c | 36 ++++++++++++++++++++++++++++++++++++ drivers/xen/balloon.c | 2 +- include/linux/memory_hotplug.h | 6 +++++- mm/Kconfig | 9 +++++++++ mm/memory_hotplug.c | 25 +++++++++++++++++++++++-- 7 files changed, 96 insertions(+), 10 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 742f69d..652efe1 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2537,6 +2537,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. shutdown the other cpus. Instead use the REBOOT_VECTOR irq. + nomemhp_autoonline Don't automatically online newly added memory. + nomodule Disable module load nopat [X86] Disable PAT (page attribute table extension of diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index ce2cfcf..041efac 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt @@ -111,8 +111,9 @@ To use memory hotplug feature, kernel must be compiled with following config options. - For all memory hotplug - Memory model -> Sparse Memory (CONFIG_SPARSEMEM) - Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) + Memory model -> Sparse Memory (CONFIG_SPARSEMEM) + Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) + Automatically online hot-added memory (CONFIG_MEMORY_HOTPLUG_AUTOONLINE) - To enable memory removal, the followings are also necessary Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE) @@ -254,12 +255,25 @@ If the memory block is online, you'll read "online". If the memory block is offline, you'll read "offline". -5.2. How to online memory +5.2. Memory onlining ------------ -Even if the memory is hot-added, it is not at ready-to-use state. -For using newly added memory, you have to "online" the memory block. +When the memory is hot-added, the kernel decides whether or not to "online" +it according to the policy which can be read from "hotplug_autoonline" file +(requires CONFIG_MEMORY_HOTPLUG_AUTOONLINE): -For onlining, you have to write "online" to the memory block's state file as: +% cat /sys/devices/system/memory/hotplug_autoonline + +The default is "online" which means the newly added memory will be onlined +after adding. Automatic onlining can be disabled by writing "offline" to the +"hotplug_autoonline" file: + +% echo offline > /sys/devices/system/memory/hotplug_autoonline + +or by booting the kernel with "nomemhp_autoonline" parameter. + +If the automatic onlining wasn't requested or some memory block was offlined +it is possible to change the individual block's state by writing to the "state" +file: % echo online > /sys/devices/system/memory/memoryXXX/state diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 25425d3..6f9ce3a 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -438,6 +438,39 @@ print_block_size(struct device *dev, struct device_attribute *attr, static DEVICE_ATTR(block_size_bytes, 0444, print_block_size, NULL); +#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE +/* + * Memory auto online policy. + */ + +static ssize_t +show_memhp_autoonline(struct device *dev, struct device_attribute *attr, + char *buf) +{ + if (memhp_autoonline) + return sprintf(buf, "online\n"); + else + return sprintf(buf, "offline\n"); +} + +static ssize_t +store_memhp_autoonline(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + if (sysfs_streq(buf, "online")) + memhp_autoonline = true; + else if (sysfs_streq(buf, "offline")) + memhp_autoonline = false; + else + return -EINVAL; + + return count; +} + +static DEVICE_ATTR(hotplug_autoonline, 0644, show_memhp_autoonline, + store_memhp_autoonline); +#endif + /* * Some architectures will have custom drivers to do this, and * will not need to do it from userspace. The fake hot-add code @@ -737,6 +770,9 @@ static struct attribute *memory_root_attrs[] = { #endif &dev_attr_block_size_bytes.attr, +#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE + &dev_attr_hotplug_autoonline.attr, +#endif NULL }; diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 12eab50..890c3b5 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -338,7 +338,7 @@ static enum bp_state reserve_additional_memory(void) } #endif - rc = add_memory_resource(nid, resource); + rc = add_memory_resource(nid, resource, false); if (rc) { pr_warn("Cannot add additional memory (%i)\n", rc); goto err; diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 2ea574f..367e7d2 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -99,6 +99,10 @@ extern void __online_page_free(struct page *page); extern int try_online_node(int nid); +#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE +extern bool memhp_autoonline; +#endif + #ifdef CONFIG_MEMORY_HOTREMOVE extern bool is_pageblock_removable_nolock(struct page *page); extern int arch_remove_memory(u64 start, u64 size); @@ -267,7 +271,7 @@ static inline void remove_memory(int nid, u64 start, u64 size) {} extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn, void *arg, int (*func)(struct memory_block *, void *)); extern int add_memory(int nid, u64 start, u64 size); -extern int add_memory_resource(int nid, struct resource *resource); +extern int add_memory_resource(int nid, struct resource *resource, bool online); extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default, bool for_device); extern int arch_add_memory(int nid, u64 start, u64 size, bool for_device); diff --git a/mm/Kconfig b/mm/Kconfig index 97a4e06..dd1b8ea 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -200,6 +200,15 @@ config MEMORY_HOTREMOVE depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION +config MEMORY_HOTPLUG_AUTOONLINE + bool "Automatically online hot-added memory" + depends on MEMORY_HOTPLUG_SPARSE + help + When memory is hot-added, it is not at ready-to-use state, a special + userspace action is required to online the newly added blocks. With + this option enabled, the kernel will try to online all newly added + memory automatically. + # Heavily threaded applications may benefit from splitting the mm-wide # page_table_lock, so that faults on different parts of the user address # space can be handled with less contention: split it at this NR_CPUS. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 67d488a..32a7b7c 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -76,6 +76,18 @@ static struct { #define memhp_lock_acquire() lock_map_acquire(&mem_hotplug.dep_map) #define memhp_lock_release() lock_map_release(&mem_hotplug.dep_map) +#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE +bool memhp_autoonline = true; +EXPORT_SYMBOL_GPL(memhp_autoonline); + +static int __init setup_memhp_autoonline(char *str) +{ + memhp_autoonline = false; + return 0; +} +__setup("nomemhp_autoonline", setup_memhp_autoonline); +#endif + void get_online_mems(void) { might_sleep(); @@ -1232,7 +1244,7 @@ int zone_for_memory(int nid, u64 start, u64 size, int zone_default, } /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */ -int __ref add_memory_resource(int nid, struct resource *res) +int __ref add_memory_resource(int nid, struct resource *res, bool online) { u64 start, size; pg_data_t *pgdat = NULL; @@ -1292,6 +1304,11 @@ int __ref add_memory_resource(int nid, struct resource *res) /* create new memmap entry */ firmware_map_add_hotplug(start, start + size, "System RAM"); + /* online pages if requested */ + if (online) + online_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, + MMOP_ONLINE_KEEP); + goto out; error: @@ -1315,7 +1332,11 @@ int __ref add_memory(int nid, u64 start, u64 size) if (!res) return -EEXIST; - ret = add_memory_resource(nid, res); +#ifdef CONFIG_MEMORY_HOTPLUG_AUTOONLINE + ret = add_memory_resource(nid, res, memhp_autoonline); +#else + ret = add_memory_resource(nid, res, false); +#endif if (ret < 0) release_memory_resource(res); return ret; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html