>-----Original Message----- >From: Linux-nvdimm [mailto:linux-nvdimm-bounces@xxxxxxxxxxxx] On Behalf >Of Dave Hansen >Sent: Thursday, January 17, 2019 2:19 AM >To: dave@xxxxxxxx >Cc: thomas.lendacky@xxxxxxx; mhocko@xxxxxxxx; >linux-nvdimm@xxxxxxxxxxxx; tiwai@xxxxxxx; Dave Hansen ><dave.hansen@xxxxxxxxxxxxxxx>; Huang, Ying <ying.huang@xxxxxxxxx>; >linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; bp@xxxxxxx; >baiyaowei@xxxxxxxxxxxxxxxxxxxx; zwisler@xxxxxxxxxx; >bhelgaas@xxxxxxxxxx; Wu, Fengguang <fengguang.wu@xxxxxxxxx>; >akpm@xxxxxxxxxxxxxxxxxxxx >Subject: [PATCH 4/4] dax: "Hotplug" persistent memory for use like normal >RAM > > >From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > >Currently, a persistent memory region is "owned" by a device driver, >either the "Direct DAX" or "Filesystem DAX" drivers. These drivers >allow applications to explicitly use persistent memory, generally >by being modified to use special, new libraries. > >However, this limits persistent memory use to applications which >*have* been modified. To make it more broadly usable, this driver >"hotplugs" memory into the kernel, to be managed ad used just like >normal RAM would be. > >To make this work, management software must remove the device from >being controlled by the "Device DAX" infrastructure: > > echo -n dax0.0 > /sys/bus/dax/drivers/device_dax/remove_id > echo -n dax0.0 > /sys/bus/dax/drivers/device_dax/unbind > >and then bind it to this new driver: > > echo -n dax0.0 > /sys/bus/dax/drivers/kmem/new_id > echo -n dax0.0 > /sys/bus/dax/drivers/kmem/bind Is there any plan to introduce additional mode, e.g. "kmem" in the userspace ndctl tool to do the configuration? >After this, there will be a number of new memory sections visible >in sysfs that can be onlined, or that may get onlined by existing >udev-initiated memory hotplug rules. > >Note: this inherits any existing NUMA information for the newly- >added memory from the persistent memory device that came from the >firmware. On Intel platforms, the firmware has guarantees that >require each socket's persistent memory to be in a separate >memory-only NUMA node. That means that this patch is not expected >to create NUMA nodes, but will simply hotplug memory into existing >nodes. > >There is currently some metadata at the beginning of pmem regions. >The section-size memory hotplug restrictions, plus this small >reserved area can cause the "loss" of a section or two of capacity. >This should be fixable in follow-on patches. But, as a first step, >losing 256MB of memory (worst case) out of hundreds of gigabytes >is a good tradeoff vs. the required code to fix this up precisely. > >Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >Cc: Dan Williams <dan.j.williams@xxxxxxxxx> >Cc: Dave Jiang <dave.jiang@xxxxxxxxx> >Cc: Ross Zwisler <zwisler@xxxxxxxxxx> >Cc: Vishal Verma <vishal.l.verma@xxxxxxxxx> >Cc: Tom Lendacky <thomas.lendacky@xxxxxxx> >Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >Cc: Michal Hocko <mhocko@xxxxxxxx> >Cc: linux-nvdimm@xxxxxxxxxxxx >Cc: linux-kernel@xxxxxxxxxxxxxxx >Cc: linux-mm@xxxxxxxxx >Cc: Huang Ying <ying.huang@xxxxxxxxx> >Cc: Fengguang Wu <fengguang.wu@xxxxxxxxx> >Cc: Borislav Petkov <bp@xxxxxxx> >Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >Cc: Yaowei Bai <baiyaowei@xxxxxxxxxxxxxxxxxxxx> >Cc: Takashi Iwai <tiwai@xxxxxxx> >--- > > b/drivers/dax/Kconfig | 5 ++ > b/drivers/dax/Makefile | 1 > b/drivers/dax/kmem.c | 93 >+++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 99 insertions(+) > >diff -puN drivers/dax/Kconfig~dax-kmem-try-4 drivers/dax/Kconfig >--- a/drivers/dax/Kconfig~dax-kmem-try-4 2019-01-08 09:54:44.051694874 >-0800 >+++ b/drivers/dax/Kconfig 2019-01-08 09:54:44.056694874 -0800 >@@ -32,6 +32,11 @@ config DEV_DAX_PMEM > > Say M if unsure > >+config DEV_DAX_KMEM >+ def_bool y >+ depends on DEV_DAX_PMEM # Needs DEV_DAX_PMEM infrastructure >+ depends on MEMORY_HOTPLUG # for add_memory() and friends >+ > config DEV_DAX_PMEM_COMPAT > tristate "PMEM DAX: support the deprecated /sys/class/dax interface" > depends on DEV_DAX_PMEM >diff -puN /dev/null drivers/dax/kmem.c >--- /dev/null 2018-12-03 08:41:47.355756491 -0800 >+++ b/drivers/dax/kmem.c 2019-01-08 09:54:44.056694874 -0800 >@@ -0,0 +1,93 @@ >+// SPDX-License-Identifier: GPL-2.0 >+/* Copyright(c) 2016-2018 Intel Corporation. All rights reserved. */ >+#include <linux/memremap.h> >+#include <linux/pagemap.h> >+#include <linux/memory.h> >+#include <linux/module.h> >+#include <linux/device.h> >+#include <linux/pfn_t.h> >+#include <linux/slab.h> >+#include <linux/dax.h> >+#include <linux/fs.h> >+#include <linux/mm.h> >+#include <linux/mman.h> >+#include "dax-private.h" >+#include "bus.h" >+ >+int dev_dax_kmem_probe(struct device *dev) >+{ >+ struct dev_dax *dev_dax = to_dev_dax(dev); >+ struct resource *res = &dev_dax->region->res; >+ resource_size_t kmem_start; >+ resource_size_t kmem_size; >+ struct resource *new_res; >+ int numa_node; >+ int rc; >+ >+ /* Hotplug starting at the beginning of the next block: */ >+ kmem_start = ALIGN(res->start, memory_block_size_bytes()); >+ >+ kmem_size = resource_size(res); >+ /* Adjust the size down to compensate for moving up kmem_start: */ >+ kmem_size -= kmem_start - res->start; >+ /* Align the size down to cover only complete blocks: */ >+ kmem_size &= ~(memory_block_size_bytes() - 1); >+ >+ new_res = devm_request_mem_region(dev, kmem_start, kmem_size, >+ dev_name(dev)); >+ >+ if (!new_res) { >+ printk("could not reserve region %016llx -> %016llx\n", >+ kmem_start, kmem_start+kmem_size); >+ return -EBUSY; >+ } >+ >+ /* >+ * Set flags appropriate for System RAM. Leave ..._BUSY clear >+ * so that add_memory() can add a child resource. >+ */ >+ new_res->flags = IORESOURCE_SYSTEM_RAM; >+ new_res->name = dev_name(dev); >+ >+ numa_node = dev_dax->target_node; >+ if (numa_node < 0) { >+ pr_warn_once("bad numa_node: %d, forcing to 0\n", numa_node); >+ numa_node = 0; >+ } >+ >+ rc = add_memory(numa_node, new_res->start, resource_size(new_res)); >+ if (rc) >+ return rc; >+ >+ return 0; >+} >+EXPORT_SYMBOL_GPL(dev_dax_kmem_probe); >+ >+static int dev_dax_kmem_remove(struct device *dev) >+{ >+ /* Assume that hot-remove will fail for now */ >+ return -EBUSY; >+} >+ >+static struct dax_device_driver device_dax_kmem_driver = { >+ .drv = { >+ .probe = dev_dax_kmem_probe, >+ .remove = dev_dax_kmem_remove, >+ }, >+}; >+ >+static int __init dax_kmem_init(void) >+{ >+ return dax_driver_register(&device_dax_kmem_driver); >+} >+ >+static void __exit dax_kmem_exit(void) >+{ >+ dax_driver_unregister(&device_dax_kmem_driver); >+} >+ >+MODULE_AUTHOR("Intel Corporation"); >+MODULE_LICENSE("GPL v2"); >+module_init(dax_kmem_init); >+module_exit(dax_kmem_exit); >+MODULE_ALIAS_DAX_DEVICE(0); >diff -puN drivers/dax/Makefile~dax-kmem-try-4 drivers/dax/Makefile >--- a/drivers/dax/Makefile~dax-kmem-try-4 2019-01-08 09:54:44.053694874 >-0800 >+++ b/drivers/dax/Makefile 2019-01-08 09:54:44.056694874 -0800 >@@ -1,6 +1,7 @@ > # SPDX-License-Identifier: GPL-2.0 > obj-$(CONFIG_DAX) += dax.o > obj-$(CONFIG_DEV_DAX) += device_dax.o >+obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o > > dax-y := super.o > dax-y += bus.o >_ >_______________________________________________ >Linux-nvdimm mailing list >Linux-nvdimm@xxxxxxxxxxxx >https://lists.01.org/mailman/listinfo/linux-nvdimm